Previous studies indicated that applying PDM is effective (Mendonça et al., 2018) for corrective and preventive maintenance, and that custom static analysis rules produced can be reused in other applications (Mendonça & Kalinowski, 2020). However, few maintainers applied the method. Nevertheless, their feedback about the PDM experience was positive, strengthening our confidence that PDM could help other maintainers. However, this feedback is insufficient to know if they would be able to effectively apply PDM and accept the technology, as well as the challenges faced. Aligned with the methodology for introducing software processes described by Shull et al. (2001), given that at this point the feasibility of PDM was already determined, the next step was to conduct an observational study.
In this way, our research objective is to evaluate PDM concerning the effectiveness and acceptance from the viewpoint of different maintainers, answering the following research questions:
RQ1. What are the challenges faced by maintainers while applying PDM for creating custom static analysis rules?
RQ1. Would maintainers accept to use PDM?
We conducted an observational study of PDM application using five groups of novice maintainers with different experiences and knowledge. The first group served as a pilot for instrument validation and was composed of computer science graduate students (n = 9). The other four groups were composed of computer science undergraduate students. Students from groups A (n = 27), C (n = 18), and D (n = 10) had no previous experience with the software under investigation and limited experience with the involved technologies (JEE), whereas students from group B (n = 18) had previous experience with the software and were more familiar with its technologies.
Groups A and B were trained and applied the two main reasoning steps involved in PDM, concerning identifying defect patterns and evaluating them for improvement. Additionally, for group B, we conducted the static analysis rule programming task to investigate whether maintainers would be able to implement the rule directly using an the source code Abstract Syntax Tree (AST). Unfortunately, even with a short training session, none of them was able to correctly implement the rule. We concluded that expecting developers which are not rule experts to develop defect rules directly using the AST was unrealistic. Therefore, we redesigned the second task to use a DSL with abstractions to ease the rule programming (Crispe & Mendonça, 2021) and conducted new trials with groups C and D to further investigate the rule programming feasibility.
We collected the results of applying those tasks and their feedback on the difficulties found. We also used the TAM (Davis, 1989) to assess the acceptance of PDM by maintainers in its three dimensions: ease of use, usefulness, and intention of use.
The remainder of this section is organized based on the planning part of a guideline for reporting experiments described by Jedlitschka et al. (2008).
The research objective covered in this study is to evaluate PDM concerning the effectiveness of maintainers and the acceptance and from the viewpoint. In this way, following the GQM template (Basili et al., 1994), we have the following goal:
Analyze PDM for the purpose of characterization with respect to challenges faced on conducting its steps, perceived usefulness, ease of use, and intention of use from the point of view of maintainers in the context of computer science students applying the PDM steps on excerpts of artifacts from a real and specific software product.
We selected the subjects of the study by convenience. We had access to graduate and undergraduate students in courses related to software quality of two different Brazilian universities. The first group of students was composed of nine graduate students in informatics from the Pontifical Catholic University of Rio de Janeiro (PUC-Rio). We called this group pilot since its main purpose was helping to validate our materials. The other four groups, A (n = 27), B (n = 18), C (n = 18), and D (n = 10) were composed of undergraduate students in computer science from PUC-Rio (A and C) and from the Federal Center of Technological Education Celso Suckow da Fonseca (CEFET/RJ) (B and D). Students of groups A and C were enrolled in a discipline on software testing and measurement, which is in the second year of their course, whereas students of groups B and D were enrolled in a discipline on software engineering, which is in the third year of their course.
One relevant difference between the groups was that group B had previous experience with the software on which they would apply PDM. The previous experience was possible because that software was used in the final discipline assignment in which group B students were enrolled. At the time when the students performed the tasks of the study, the assignment was already passed to the students.
All experiment materials are available online in our replication packageFootnote 3 in the zenodo.org open science repository. A description of these materials is provided hereafter.
The characterization of students was made by filling a characterization form with questions about their experience (in months) with software development and maintenance in different contexts (for their own use, in a course, and in the industry). We also included questions about the level of experience with techniques and technologies that could influence the results of the experiments. In this case, we asked about their level of experience in Java, JEE, stack trace reading, static analysis rule programming, and source code inspections, as well as their proficiency in the English language.
The software for applying PDM was selected by convenience. The selected Internship and Employment Management System (SisGEE)Footnote 4 is an information system developed by students as an assignment of a web programming discipline of a computer science course at CEFET/RJ. SisGEE was developed using JEE technology and contained some defect patterns of unhandled latent exceptions in its source code.
We exercised some of those unhandled exceptions to produce a log for PDM application (see Table 1). The version of SisGEE that was used for producing this log is available on Github.Footnote 5 The log contains two failures produced for invalid conversion from string to an integer (NumberFormatException), two failures produced by access in a service layer that returns null, and the null value is used without previously checking (NullPointerException), and other two failures that do not form any pattern.
The first task of the study (task 1) consists of executing the first PDM step, i.e., failure analysis and defect pattern identification. The failure identification consists of extracting failure data from logs filling a provided form. The groups of maintainers that participated in this task (pilot, A, and B) received the same form for failure identification. The data that should be extracted consists of a file name and line where the exception was thrown as well as the exception type and error message contained in the failure. After performing failure identification, maintainers were instructed to use the extracted data to compare failures and identify similar ones.
Thereafter, maintainers were instructed to inspect the source code related to similar failures to identify patterns formed by the defects. If a defect pattern was identified, maintainers should document it. The pilot group received a form with separate fields for information that would be useful for identifying a defect pattern, whereas groups A and B received training in using a pattern language and should document the defect patterns using that language. Table 2 presents an example of this form filled. The pattern language used by groups A and B consists of the same syntax of the software programming language (Java) but including wildcard symbols and conventions for documenting the pattern. Table 6 presents the wildcards and conventions available in the pattern language while Table 7 presents an example of defect pattern documented using this language.
After performing each task of the study, the maintainers were asked to fill a follow-up questionnaire with questions about their strategies and perceptions on the task. The questionnaire used for the first task of the study was equal for all groups of maintainers. The questions asked concerned the strategy used by the maintainer to identify the defect pattern, the perception if the time was enough to complete the task, the confidence in the patterns reported, the ease of performing the task, and the difficulties found.
Task 2 consisted of programming a static analysis rule. For this task, one defect pattern documentation was provided and the maintainers were asked to program a static analysis rule that locates the instances of this defect pattern. The provided defect pattern documentation was the one presented in Table 2, which concerns a NumberFormatException. We understand that such exception could be handled in caller methods/functions. However, the intention of the study was to evaluate the factors of influence on developing simple rules. Thus, the pattern described in Table 2 to be implemented considered only exception handling within the same method.
The groups that participated in task 2 were groups B, C, and D. For group B, we asked participants to implement the rule directly in Java using the source code abstract syntax tree and the Visitor design pattern (Gamma et al., 1995), an approach that is commonly applicable in static analysis tool. Nevertheless, none of them was able to correctly implement the rule. Therefore, we redesigned task 2 to use a DSL with abstractions to ease the rule programming for the trials conducted with groups C and D. The tool selected for static analysis rule programming was SCPL (Crispe & Mendonça, 2021), which supports implementing rules for Java programs using markups in code pattern examples. After finishing the task, the maintainers were asked to provide the source code of the programmed static analysis rule and fill the follow-up questionnaire, which follows the same template of task 1.
Finally, task 3 comprised the PDM steps of rule evaluation and context analysis. To perform this task, we provided the documentation of one defect pattern, the source code of the application that contains this defect pattern, and a list of source code lines in this application that were alerted by a static analysis rule that implements the defect pattern. Table 2 presents the provided defect pattern documentation. The application source code was the same one of the other tasks. The alerted source code lines were provided to the maintainers in a form.
During task 3, maintainers should classify the alerts provided as defects or false positives. If a false positive was found, they should inform which fixing alternative was present in the source code. In the case of finding new fixing alternatives, maintainers should document them. The pilot group documented the fixing alternatives using a form while groups A and B used the pattern language. After performing the task, the maintainers were asked to fill the follow-up questionnaire, which is similar to the follow-up questionnaire of other tasks.
At the end of the study, the maintainers were asked to fill the TAM questionnaire. This questionnaire is composed of nine questions split into three dimensions: usefulness, ease of use, and intention to use. The answers are provided in a five-point Likert scale ranging from strongly disagree to strongly agree. The questions of the TAM questionnaire, adjusted to our study, are presented in Table 8.
Question refinement and variables
The first knowledge question we wanted to answer (RQ1) concerned the challenges faced by maintainers, i.e., for each task of the study, we want to understand how effective maintainers are in performing the task and the difficulties found. We used the percentage of maintainers that completed each task with success to provide us insights about how easy it is for a maintainer to effectively perform the task. Having this in mind, we aim at answering the following more detailed questions regarding the effectiveness:
RQ1. What are the challenges faced by maintainers while applying PDM for creating custom static analysis rules?
What are the challenges faced by maintainers while identifying and documenting defect patterns?
What are the challenges faced by maintainers while programming a custom static analysis rule?
What are the challenges faced by maintainers while identifying and documenting fixing alternatives present in false positives of a rule?
Some knowledge and experiences might influence applying PDM. Hence, we were interested in having insights on how these affect the effectiveness of maintainers in applying PDM. Therefore, in our study, we additionally investigated if knowledge and experience in Java, JEE, static analysis programming, stack trace reading, and source code inspection have an influence on applying PDM, as well as maintainers’ previous experience with software development, software maintenance, and with the software that was used in the study.
The second knowledge question we wanted to answer (RQ2) concerned the acceptance of PDM by maintainers. Therefore, we used the TAM questionnaire (see Table 8) to evaluate the acceptance of PDM by the maintainers. Based on the TAM constructs, we answer the following questions:
RQ2. Would maintainers accept to use PDM?
How do maintainers perceive PDM regarding its ease of use?
How do maintainers perceive PDM regarding its usefulness?
Do maintainers intend to use PDM after experimenting it?
As TAM makes positive questions about the technology (see Table 8), we want to know the frequency in which maintainers agree with the questions. Additionally, we wanted to better understand the difficulties found by maintainers during PDM application. The frequency of certain difficulties found by maintainers might indicate their importance and improvement opportunities for PDM.
Tables 9 and 10, respectively, describe the set of independent and dependent variables together with their types and scales.
Experimental procedures and operation
The study started with the proper preparation of a laboratory with computers and Netbeans IDE for the subjects to be able to perform the tasks. As soon as subjects arrived, they received the consent form and the characterization form. After filling these forms, an introductory presentation of 30 min about PDM method was held, followed by a 20-min training on task 1 activities.
This training included learning how to identify the data that should be extracted from the error logs, how to compare this data to identify similar failures, and how to compare similar failures in the source code to identify and document a defect pattern. The training of the pilot group was slightly different from the one of groups A and B because the forms used for documenting failures and defect patterns were different. After training, they received a brief explanation about task 1 and the materials of this task were distributed, i.e., the forms of task 1 together with the logs and the application source code. Participants had 40 min to perform task 1, which consisted of extracting data of six failures from logs and identifying and documenting two defect patterns found in the application source code. At the end, they filled the follow-up form.
After performing task 1, the same groups of maintainers performed task 3. The pilot group received a 10-min break between task 1 and task 3; group A performed task 1 and task 3 in two different days; finally, group B did not receive any interval between the tasks.
We started task 3 by distributing the forms of the task and the defect pattern documentation presented in Table 2. After that, we applied a training session of 20 min regarding task 3. In this session, we showed how to identify false positives and how to document new defect fixing alternatives. Thereafter, the maintainers had 40 min to inspect 16 alerts of a defect pattern for classifying them into defects or false positives. The provided set of alerts contained 3 defects and 13 false positives that include two new fixing alternatives for the defect pattern. Finishing task 3, maintainers filled the follow-up questionnaire. At the end of task 3, we asked the maintainers to fill the TAM questionnaire.
We expected task 2 to be more difficult and time-consuming than task 3. Therefore, we were not able to apply task 2 to group A and decided to apply task 2 to group B of maintainers in a separate day. Group B performed this task directly using the source code abstract tree, and none of participants was able to complete the task. Hence, we decided to conduct new trials with groups C and D, using the SCPL DSL to better understand the feasibility and difficulties.
Hence, task 2 was applied on a different time for groups C and D, which fosued solely on this specific task. Also, due to COVID-19 pandemic, it was applied for these groups in an online setup. We started task 2 with an introductory presentation of 30 min about the PDM method. After that, a 20-min presentation of SCPL was held and the researchers helped students to prepare their computer’s environment for using the tool. For operational reasons (group C had 3-h classes once a week, while group D had 2-h classes twice a week), group C started to perform the rule programming part of the task right after the SCPL presentation while group D had 2 days of interval before starting this part. The rule programming started by distributing the form of the task; then, the maintainers had 50 min to implement the rule. Finally, they filled the follow-up form.