The importance of measurement in research and technology is indisputable. Measurement is the fundamental mechanism of scientific study and development, and it allows to describe the different phenomena of the universe through the exact and general language of mathematics, without which it would be challenging to define practical or theoretical approaches from scientific investigation. Measurement allows us to understand what is happening and why and becomes critical for estimating progress, success, and failures. Indeed, it is very hard (if not impossible) to control what cannot be understood. Moreover, measurement provides great benefits in domains in which it has not been previously widely used or has not been applied at all. The economic, political, and social spheres have started to adopt quantitative methods more often. These aspects of modern society are a challenge to manage. For this purpose, we need data from measurements, the collection and analysis of which by means of smart machines and special tools is becoming more accessible.

1.1 Definitions

Measurement is the process by which numbers or symbols are assigned to attributes of entities in the real world in such a way as to describe them according to clearly defined rules (Fenton et al. 1997).

Example

We need to compare the reliability of software systems. Using the GQM approach (see Sect. 4.2 for more details), the authors provide several software measures (metrics) through questions. The following are the questions and metrics that they selected:Footnote 1

  • How likely is the code to have a failure?

    • The number of Modification Requests that are present.

    • The rate at which Modification Requests are issued.

    • The density of Modification Requests over the physical size of the project.

  • What is the rate at which failures are detected?

    • The percentage of Modification Requests that are fixed.

    • The speed at which Modification Requests are fixed, when they are fixed.

    • A subjective evaluation of the overall fixing process, performed by analyzing the curve of fixing of Modification Requests.

  • What is the rate at which failures are detected?

    • The parameters of the Software Reliability Growth Models.

    • A subjective evaluation of the timings of arrivals of Modification Requests, performed by analyzing the curves of arrivals.

1.2 Meaning and Advantages

Measurement plays an important role throughout all our life in the modern world. But measurement for measurement’s sake is not needed. Measurement is one of the steps of research, design, planning, and implementation. Evaluation is essential. Evaluation considers the goals and the context and decides what the concrete measure indicates.

The absence of measurement or weak measurement can lead to the following consequences:

  • Lack of measurable targets (Gilb’s principle)

    If you do not know what to measure, you cannot improve anything.

  • Identification failure

    If you identify your goals or metrics wrongly, your results do not help you to improve; moreover, they can even make your processes worse.

  • Lack of quality assurance

    If you cannot measure what you are doing, you cannot ensure quality because you are not capable of controlling the process.

  • Lack of consistent tool evaluation

    Your measurement tools need to be consistent (i.e., be capable of independently measuring the same concepts correctly), so that your measurement can be considered reliable.

There are a lot of benefits acquired through measurement. We can categorize them by professional activity.

There are the following advantages for managers:

  • Cost

  • Productivity

  • Quality

  • User satisfaction

  • Optimization

Indeed, through measurement managers can decrease costs by eliminating useless expenses and increase the productivity of the team and the quality of the produced software, and thus maximize user satisfaction. Optimization of the working process is of utter importance, if you want to succeed.

Also, let us define the advantages for engineers in the following list:

  • Requirements testing

  • Fault detection

  • Meeting goals

  • Forecasting

By conducting measurement at the early stages of the development process, engineers can save a lot of time and effort by eliminating useless or invalid requirements and detecting existing and possible faults, thereby sticking to the initial goals.

Moreover, customers, developers, and managers can use software metrics to track the evolution of the project in terms of:

  • Hours spent

  • Required quality levels

  • Requests for new requirements

  • Work overtime

  • Quality of work

If we talk about the benefits, it is also worth mentioning the following scope of software measurement:

  • Cost and effort estimation

    Effort is expressed as a function of one or more variables such as the size of the program, the capability of the developers, and the level of reuse. Cost and effort estimation models have been proposed to predict the project cost during the early phases in the software life cycle (e.g., Boehm’s COCOMO model, Putnam’s slim model, Albrecht’s function point model).

  • Productivity measures and models

    Productivity can be considered as a function of the value and the cost. Each can be decomposed into different measurable sizes, functionality, time, money, etc.

  • Data collection

    The quality of any measurement program is clearly dependent on careful data collection. Data collected can be distilled into simple charts and graphs so that managers can understand the progress and problem of the development. Data collection is also essential for scientific investigation of relationships and trends.

  • Quality models and measures

    Quality models have been developed for the measurement of the quality of the product without which productivity is meaningless. These quality models can be combined with productivity models for measuring the correct productivity. These models are usually constructed in a treelike fashion. The upper branches hold important high-level quality factors such as reliability and usability.

  • Reliability models

    Most quality models include reliability as a component factor; however, the need to predict and measure reliability has led to a separate specialization in reliability modeling and prediction. The basic problem in reliability theory is to predict when a system will eventually fail.

  • Performance evaluation and models

    It includes externally observable system performance characteristics such as response times and completion rates and the internal working of the system such as the efficiency of algorithms. It is another aspect of quality.

  • Structural and complexity metrics

    Here we measure the structural attributes of representations of the software, which are available in advance of execution. Then we try to establish empirically predictive theories to support quality assurance, quality control, and quality prediction.

  • Capability-maturity assessment

    This model can assess many different attributes of development including the use of tools, standard practices, and more. It is based on the key practices that every good contractor should be using.

  • Management by metrics

    Measurement plays a vital role in managing the software project. For checking whether the project is on track, users and developers can rely on the measurement-based chart and graph. The standard set of measurements and reporting methods is especially important when the software is embedded in a product where the customers are not usually well-versed in software terminology.

  • Evaluation of methods and tools

    This depends on the experimental design, proper identification of factors likely to affect the outcome, and appropriate measurement of factor attributes.

1.3 Representation Condition

Before defining this condition, one needs to make the following observations.

  • A measurement is a mapping from the empirical world to the formal, relational world (Fenton et al. 1997).

  • A measure is the number or symbol assigned to an entity by this mapping in order to characterize an attribute (Fenton et al. 1997).

Representation Condition

A measurement mapping must map entities into numbers and empirical relations into numerical relations that preserve them and vice versa (Fenton et al. 1997).

The measure is valid if it satisfies the Representation Condition (Fig. 1.1).

Fig. 1.1
figure 1

Representation condition. Based on Fenton et al. (1997, p.31)

Example

LOC satisfies the Representation Condition for physical application size, but it does not do so for functional application size because one can have a no-so-well-written program with a lot of LOC but with more or less the same functionality and vice versa.

1.4 Measurement Characteristics

Measurement should meet the following characteristics:

  • Sensitivity

    Instrument’s ability to accurately measure variability in responses.

    Example

    • A dichotomous response category, such as “agree or disagree,” does not allow the recording of subtle attitude changes.

    • A sensitive measure, with numerous items on the scale, may be needed. For instance, increase the number of items (response categories): strongly agree, agree, neutral, disagree, strongly disagree. It will increase a scale’s sensitivity.

  • Validity

    The ability of an instrument to measure what is intended to be measured. Validity of the indicator:

    • Is it a true measure?

    • Are we tapping the concept?

    Example

    • We want to establish how well a programmer did his work. If we choose LOC as a measure of wellness of the programmer’s work, will it be a valid measure?

  • Reliability

    It reflects the degree to which an instrument or scale measures the same way each time it is used under the same condition with the same subjects. Two important dimensions of reliability:

    • Repeatability—ability of a measure to remain the same over time despite uncontrolled testing conditions.

    • Consistency—indicator of the homogeneity of the items in the measure that tap the construct. In other words, the items should “hang together as a set” and be capable of independently measuring the same concept.

    Example If we run tests several times and every time the results are the same, then we can say that it is a credible measure of reliability of the program.

1.5 Kinds of Metrics

Objective and subjective

  • A metric is objective if it can be taken by an automated device. The metric is subjective otherwise.

Examples

  • LOC is an objective metric, and Function Points are subjective.

  • Measuring how well someone can complete a set number of assignments in a controlled environment is objective.

  • Measuring how difficult it was to write the code is subjective.

Direct and indirect

A metric is direct if it can be directly detected and indirect if it is the result of mathematical elaboration on other metrics.

Examples

  • LOC, number of errors, duration of testing process, number of defects discovered during test, time a developer spends on a project, and Functional Points are direct.

  • Number of errors per LOC (error density) is indirect.

1.6 Measurement Scales

The kind of data received defines the relevant measurement scale. Also, the measurement scale defines the relevant statistical method for analyzing actual data and making conclusions from that data. Each type of measurement scale has a specific use.

A measurement scale is a class of mapping that links empirical and number relations with specific properties.

Each measurement scale should satisfy one or more of the following characteristics:

  • Identity—every number on the measurement scale has to be unique.

  • Ordered relationship—values should have ordered relationship to one another (magnitude). For example, some values are less and some are more than others.

  • Equal intervals—scale units are equal to each other. This characteristic means, for instance, that the difference between 10 and 11 should be equal to the difference between 21 and 22.

  • A minimum value of zero—the scale should have a true zero point. There should be no values below this point.

Measurement distinguishes different classes of how to assign symbols to real-world aspects: nominal, ordinal, interval, ratio, and absolute scales.

  • Nominal

    We use a nominal scale if we create categories and assign real-world entities to these categories. Values attached to variables show a category but do not have an original numerical value. For nominal, any 1:1 mapping is OK.

    Example

    Gender. This is a variable that is measured on a nominal scale. People may be categorized as “female” or “male,” but neither value shows less or more “gender” than the other.

  • Ordinal

    If we can rank the categories of symbols so that we can say that something is higher, larger, smaller, etc., we need an ordinal scale. It satisfies identity and magnitude characteristics. Each value on this scale is unique. Also, ordinal scales show the order of the data according to some criteria. For ordinal, the mapping needs to be strictly increasing.

    Example

    Answers for the question “How do you rate our product?” in some test—excellent, very good, good, satisfactory, bad, very bad.

  • Interval

    An ordinal scale is good, but it does not tell us the amount of difference between the categories. Is the difference between excellent and very good the same as between bad and very bad? If we need an answer for this kind of question, we should use an interval scale. An interval scale provides identity, magnitude, and equal interval characteristics. For interval, the mapping must have the form:

    Y = aX + b, with a > 0

    Example

    The Fahrenheit scale to measure temperature: a difference of 5 degrees between 55 and 60 means the same as 5 degrees between 15 and 20. Through an interval scale, one may know not only the difference between values, like bigger or smaller, but also how much bigger or smaller they are.

  • Ratio

    Interval scales define the differences between categories. An example is a set of dates. We can either add or subtract two dates, but if we try to multiply or divide two dates, the result will not make any sense. We define a ratio scale as a scale where the mutual proportion between the measurements makes sense. A ratio scale must have a “natural” zero element, that is, an element representing the absence of the property being measured. A ratio scale contains all the above properties, specifically: identity, magnitude, equal intervals, and a minimum value of zero. For ratio, the mapping must have the form:

    Y = aX, with a > 0

    Example

    The weight of an object, LOC, in a program.

  • Absolute

    The absolute scale is the most restrictive of all. Absolute measures are counts. The measurement for an absolute scale is made simply by counting the number of elements in the entity set. All arithmetic analysis of the resulting count is meaningful. For absolute, the only acceptable mapping is of the form:

    Y = X

    Example

    The number of if statements in a program, the number of failures in a module, etc.

1.7 Software Metrics

Software measurements are of two categories, namely, direct and indirect measures.

Direct measures of the software engineering process include cost and effort applied. Direct measures of the product include lines of code (LOC) produced, execution speed, memory size, and defects reported over a set period of time. Indirect measures of the product include functionality, quality, complexity, efficiency, reliability, maintainability, and many other “-abilities.”

1.7.1 Lines of Code (LOC)

LOC or thousands of LOC (KLOC) has been a very traditional and direct method of metrics of software.

Most traditional measures are used to quantify software complexity. They are simple, easy to count, and very easy to understand. They do not, however, take into account the intelligence content and the layout of the code. These metrics are derived by normalizing the quality and productivity measures by considering the size of the product as a metric.

Size-oriented metrics depend on the programming language used. LOC measures is an older method that was developed when FORTRAN and COBOL programming were very popular. Productivity is defined as KLOC/EFFORT, where effort is measured in person-months. As productivity depends on KLOC, assembly language code will have more productivity, and the more expressive the programming language, the lower the productivity. The LOC method of measurement is not applicable to projects that deal with visual (GUI-based) programming.

It requires that all organizations must use the same method for counting LOC. This is so because some organizations use only executable statements, some use comments, and some do not. So, a standard needs to be established, for instance, using the count of the “;”. Once a standard is set, they can be computed automatically (Objective metrics).

Based on the LOC/KLOC count of software, many other metrics can be computed:

  • Errors/KLOC

  • $/KLOC

  • Defects/KLOC

  • Pages of documentation/KLOC

  • Errors/PM

  • Productivity =  KLOC/PM (effort is measured in person-months)

  • $/Page of documentation

1.7.2 Cyclomatic Complexity

Cyclomatic complexity v(G) was introduced by Thomas McCabe in 1976. It measures the number of linearly independent paths through a program module (Control Flow). The McCabe complexity is one of the more widely accepted software metrics, and it is intended to be independent of language and language format. It is considered as a broad measure of soundness and confidence for a program.

Code complexity correlates with the defect rate and robustness of the application program. Code with good complexity:

  • Contains fewer errors

  • Is easier and faster to test

  • Is easier to understand

  • Is easier to maintain

Code complexity metrics are used to locate complex code. To obtain a high-quality software with low cost of testing and maintenance, the code complexity should be measured as early as possible in coding. The developer can adapt their code when recommended values are exceeded.

Recommendations:

  • Function length should be 4–40 program lines. A function definition contains at least a prototype, one line of code, and a pair of braces, which makes four lines.

  • A function longer than 40 program lines probably implements many functions. (Exception: Functions containing one selection statement with many branches.)

  • Decomposing them into smaller functions often decreases readability.

  • File length should be 4–400 program lines. The smallest entity that may reasonably occupy a whole source file is a function, and the minimum length of a function is four lines.

  • Files longer than 400 program lines (10–40 functions) are usually too long to be understood as a whole.

v(G) could be calculated as:

  • Regions in the graph

  • Independent paths in the graph

  • E − N +  2

    • where:

    • E =  number of edges

    • N =  number of nodes

  • P +  1

    • where:

    • P =  number of predicated nodes (i.e., if, case, while, for, do)

v(G) =  1 for a program consisting of only sequential statements. For a single function, v(G) is one less than the number of conditional branching points in the function. The greater the cyclomatic number, the more execution paths there are through the function, and the harder it is to understand. For dynamic testing, the cyclomatic number v(G) is one of the most important complexity measures, because the cyclomatic number describes the control flow complexity. It is obvious that modules and functions with a high cyclomatic number need more test cases than modules with a lower cyclomatic number. Each function should have at least as many test cases as indicated by its cyclomatic number.

The cyclomatic number of a function should be less than 15. If a function has a cyclomatic number of 15, there are at least 15 (but probably more) execution paths through it. More than 15 paths are hard to identify and test. Functions containing one selection statement with many branches make up an exception. A reasonable upper limit cyclomatic number of a file is 100.

1.7.3 Fan In and Fan Out

The Fan In of a module is the amount of information that “enters” a module. The Fan Out of a module is the amount of information that “exits” a module. We assume all the pieces of information are of the same size. Fan In and Fan Out can be computed for functions, modules, objects, and also non-code components.

Usually parameters passed by values and external variables used before being modified count toward Fan In. External variables modified in the block and returned values count toward Fan Out. Parameters passed by reference depend on their usage.

Structural Fan In (SFIN) and Fan Out (SFOUT) values measure the relationships between files and between procedures. They measure the complexity of the static (design-time) structure of code.

A useful way to look at SFIN and SFOUT is to view them as graph metrics. Visualize procedures (or files) as nodes and calls between them as links. Fan In is the number of links coming into a node. Fan Out is the number of arrows going out of a node.

Fan metrics for files and procedures are related to each other, but they are counted with different rules.

For procedures, structural Fan In and Fan Out are calculated from the procedure call tree:

  • SFIN (procedure) =  number of procedures that call this procedure

  • SFOUT (procedure) =  number of procedures this procedure calls

A high SFIN indicates a heavily used procedure, while a low SFIN is the opposite. A high SFOUT means the procedure calls many others. A procedure with SFOUT= 0 is a leaf procedure that depends on no other procedures (it may depend on the data it reads, though).

SFIN= 0 indicates that no procedure callers were found in the analysis. It does not necessarily indicate a dead procedure, however. The procedure may be in use via an invisible call. An example of this is an event handler which is triggered based on user action. Use the dead code detection feature of Project Analyzer to find and remove dead code.

For files, structural Fan In and Fan Out are calculated from the file dependency tree:

  • SFIN (file) =  number of files that depend on this file

  • SFOUT (file) =  number of files this file depends on

A file depends on another file if it requires the other file to compile or run. It may call procedures in the other file, read/write its variables, access its constants, or use its class, interface, UDT, or enum declarations. A high SFIN indicates a heavily used file, while a low SFIN indicates the opposite. A high SFOUT means the file depends on many others. A file with SFOUT= 0 is independent of others.

An SFIN value of 2 or more indicates reused code. The higher the fan in, the more reuse.

A high SFIN is desirable for procedures because it indicates a routine that is called from many locations. Thus, it is reused, which is usually a good objective.

A high SFIN is not as desirable for a file. While it can indicate good reuse, it also represents a high level of cross-file coupling. SFIN for a file should be “reasonable.” We leave the definition of “reasonable” to be determined case by case.

A special use for procedure-level SFIN is the detection of procedures that could be inlined. If a procedure’s SFIN is low but positive, it has a small number of callers. Depending on what the procedure does and how complex it is, you could possibly embed it within the caller(s). This is a speed optimization technique. We do not recommend inlining for usual coding because keeping the code modular improves reuse and legibility.

A high SFOUT denotes strongly coupled code. The code depends on other code and is probably more complex to execute and test.

A low or zero fan out means independent, self-sufficient code. This kind of code is easier to reuse in another project or for another purpose. A file whose SFOUT= 0 is a leaf file in the project. You can include it in another project as such and it will most probably continue to work the same way.

To evaluate the average coupling between files, monitor the average SFOUT/file value. This is the average of “how many other files my files depend on.” Try to keep this value low. Achieving a low cross-file coupling should be done via restructuring and planning. It should not be achieved by just mechanically joining files—so keep an eye on the file sizes as well. Notice that SFOUT/file is likely to be higher in a large system because the parts of the system need to interact with each other. As your project grows, SFOUT/file is likely to increase even if your code is well designed.

1.7.4 Maintainability Index (MI)

Maintainability Index is calculated with certain formula from LOC, McCabe complexity and Halstead measures. It indicates when it becomes cheaper and/or less risky to rewrite the code instead of changing it.

There are two variants of the Maintainability Index: one that contains comments (MI) and one that does not contain comments (MIwoc). In fact there are three measures:

  • MIwoc: Maintainability Index without comments

  • MIcw: Maintainability Index comment weight

  • MI: Maintainability Index = MIwoc + MIcw

$$\displaystyle \begin{aligned} MIwoc = 171 - 5.2 * ln(aveV) -0.23 * aveG -16.2 * ln(aveLOC),\end{aligned} $$

where aveV is average Halstead Volume V per module, aveG is average extended cyclomatic complexity v(G) per module, aveLOC is average count of lines LOCphy per module.

$$\displaystyle \begin{aligned} MIcw = 50 * sin(\sqrt{2,4}*perCM),\end{aligned} $$

where perCM is average percent of lines of comments per module.

Maintainability Index (MI, with comments) value:

  • 85 and above means good maintainability

  • 65–85 means moderate maintainability

  • < 65 means difficult to maintain

    • with really bad pieces of code (big, uncommented, unstructured) the MI value can be even negative.

1.7.5 Quality Metrics

Software metrics can be classified into three categories:

  1. 1.

    Product metrics—These describe the characteristics of the product such as size, complexity, design features, performance, and quality level.

  2. 2.

    Process metrics—These characteristics can be used to improve the development and maintenance activities of the software.

  3. 3.

    Project metrics—These describe the project characteristics and execution.

1.7.5.1 Product Quality Metrics

  1. 1.

    Mean Time to Failure

    This is the time between failures. This metric is mostly used with safety critical systems such as airline traffic control systems, avionics, and weapons.

  2. 2.

    Defect Density

    This measures the defects relative to the software size expressed as lines of code or function point, etc., that is, it measures code quality per unit. This metric is used in many commercial software systems.

  3. 3.

    Customer Problems

    This measures the problems that customers encounter when using the product. It contains the customer’s perspective toward the problem space of the software, which includes the non-defect-oriented problems together with the defect problems.

    The problems metric is usually expressed in terms of Problems per User-Month (PUM).

    PUM =  Total problems that customers reported (true defect and non-defect-oriented problems) for a time period +  Total number of license months of the software during the period where Number of license-months of the software =  Number of install licenses of the software × Number of months in the calculation period. PUM is usually calculated for each month after the software is released to the market, and also for monthly averages by year.

  4. 4.

    Customer Satisfaction

    Customer satisfaction is often measured by customer survey data on a five-point scale from Very dissatisfied to Very satisfied.

Satisfaction with the overall quality of the product and its specific dimensions is usually obtained through various methods of customer surveys. Based on the five-point-scale data, several metrics with slight variations can be constructed and used, depending on the purpose of analysis: percent of completely satisfied customers, percent of satisfied customers, percent of dissatisfied customers, percent of non-satisfied customers.

1.7.5.2 Process Quality Metrics

Based on Lean Six Sigma, there are 13 criteria for process quality:

  1. 1.

    Cp

    Cp is a measure of potential process capability. It is the ratio of the six sigma spread of a process distribution to the tolerance of that distribution. The process must be normally distributed and stable in order to assess Cp. Cp gives the maximum process capability (Cpk) if the process is centered exactly in the middle of the tolerance.

  2. 2.

    Cpk

    Cpk is a measure of the actual process capability. It is calculated by dividing the distance of the process mean to the nearest tolerance limit by 3 standard deviations of the process. Again, the process must be normally distributed and stable before assessing Cpk. See the Statistical Process Control section of the Toolbox for additional help on this subject.

  3. 3.

    First Pass Yield

    Percentage of units that meet specifications without any rework or repair. This is a commonly used measurement but has dubious value for two reasons: (A) rework and repair is often “hidden”—takes place up the line but is not recorded, and (B) multiple defects occurring on a single unit are not captured.

  4. 4.

    Defects Per Unit

    Total number of defects identified on all units divided by the number of units. This metric gives a better measure of quality than First Pass Yield because it captures all the defects. Care must be taken to capture “hidden” rework and repairs that may take place up the line or prior to the reporting point. This metric is also more readily convertible to Defects Per Million Opportunities for Six Sigma projects.

  5. 5.

    Defects Per Million Opportunities (DPMO)

    This is a primary Six Sigma metric. Defects per opportunity is used instead of defects per unit to facilitate more direct comparisons between processes with varying levels of complexity. Assembling an automobile is far more complex than manufacturing a patio stone, with far more opportunities for error, so defects per unit is a poor basis for comparing the capability of the manufacturing process.

    Defects are a failure of the process to meet a “Critical to Quality Characteristic”—that is to say, a characteristic that customers care about. The number of opportunities must be determined based upon these “Critical to Quality Characteristics” and should be based upon a well-reasoned process. Inflating the number of opportunities will lower the ratio of defects to opportunities and bias the sigma level upward. At the level of three defects per million opportunities, the process is said to have achieved Six Sigma status.

    An important point to remember when working with DPMO is that customers do not buy opportunities, they buy units. Furthermore, all defects are not created equal, even if they are important to customers. For example, customers care about paint flaws on a car, but they care a lot more about a defect that causes the car not to start. Accordingly, it may be useful to categorize defects into different categories by process and assess the Six Sigma level of each process (e.g., Paint vs. Ignition System).

  6. 6.

    Fill Rate

    Percentage of units ordered that are shipped on a given order. If an order for 10 widgets and 10 sprockets is filled, but only 9 of the widgets are shipped due to a product shortage, then the fill rate is 95%.

  7. 7.

    Line Item Fill Rate

    Percentage of line items, or SKUs, that are shipped on a given order. If an order for 10 widgets and 10 sprockets is filled, but only 9 of the widgets are shipped due to a product shortage, then the line item fill rate is 50%, because only one of the two line items (SKUs) was shipped 100% complete.

  8. 8.

    Shipment On-Time %

    Percent of shipments made on or before the due date.

  9. 9.

    Shipping Errors Per Shipment

    Total number of shipping errors (by line item) for a period divided by the number of shipments made during that same period.

  10. 10.

    Warranty Percent of Sales

    Warranty dollars paid during a period divided by the net sales for that same period.

  11. 11.

    Warranty Claims per Unit

    Total number of warranty claims (not dollars) received during a period divided by the number of units sold during the same period.

  12. 12.

    Survey Complaints (TGWs) per Unit (or per 1000)

    The number of complaints, or “Things Gone Wrong,” reported on a customer survey divided by the total number of units included in the survey responses. This metric may also be expressed as complaints per 100 units, 1000 units, or even 1,000,000 units.

1.8 Rationale for Noninvasive Measurement

There are two generations in the history of software metrics collection (Johnson et al. 2003). The first iteration employs the Personal Software Process (PSP), a self-improvement method that assists developers in controlling, managing, and improving their workflow (Humphrey 2005). PSP is often known as an “invasive” approach to data collecting since it necessitates direct involvement of participants in the data collection process. The PSP allows users to develop and print a form that logs their effort, size, and defect information. Because the developer must switch the context and manually fill these forms, this approach introduces a lot of overhead due to form filling (Rogers et al. 1995), which uses a lot of resources.

The term “noninvasive” can be used to describe the second generation of software metrics collecting since it signifies that software engineers employ approaches in the development process that do not require their personal engagement in the data collection process (Janes et al. 2014). Table 1.1 (Johnson et al. 2003) depicts the differences between invasive and noninvasive approaches. It is evident that a noninvasive technique decreases the expenses of data gathering and processing, as well as the difficulty of context switching.

Table 1.1 Characteristics of invasive and noninvasive measurements

A noninvasive collecting system should focus on the following aspects to satisfy the characteristics shown above:

  • Automatic collection of product metrics

  • Support of the tools that are used by the developers

  • Support of the programming language used by the developers

  • Automatic installation and update of the tools for data collection

1.9 Conclusion

To assess the quality of the engineered product or system and to better understand the models that are created, some measures are used. These measures are collected throughout the software development life cycle with an intention of improving the software process on a continuous basis. Measurement helps in estimation, quality control, productivity assessment, and project control throughout a software project. Also, measurement is used by software engineers to gain insight into the design and development of work products. In addition, measurement assists in strategic decision-making as a project proceeds.

The cost and effort required to build software, the number of lines of code produced, and other direct measures are relatively easy to collect, as long as specific conventions for measurement are established in advance. However, the quality and functionality of software or its efficiency or maintainability is more difficult to assess and can be measured only indirectly.

We partition the software metrics domain into process, project, and product metrics. We have also noted that product metrics that are private to an individual are often combined to develop project metrics that are public to a software team. Project metrics are then consolidated to create process metrics that are public to the software organization as a whole.