Skip to main content

An empirical study of data constraint implementations in Java

Abstract

Software systems are designed according to guidelines and constraints defined by business rules. Some of these constraints define the allowable or required values for data handled by the systems. These data constraints usually originate from the problem domain (e.g., regulations), and developers must write code that enforces them. Understanding how data constraints are implemented is essential for testing, debugging, and software change. Unfortunately, there are no widely-accepted guidelines or best practices on how to implement data constraints. This paper presents an empirical study that investigates how data constraints are implemented in Java. We study the implementation of 187 data constraints extracted from the documentation of eight real-world Java software systems. First, we perform a qualitative analysis of the textual description of data constraints and identify four data constraint types. Second, we manually identify the implementations of these data constraints and reveal that they can be grouped into 31 implementation patterns. The analysis of these implementation patterns indicates that developers prefer a handful of patterns when implementing data constraints. We also found evidence suggesting that deviations from these patterns are associated with unusual implementation decisions or code smells. Third, we develop a tool-assisted protocol that allows us to identify 256 additional trace links for the data constraints implemented using the 13 most common patterns. We find that almost half of these data constraints have multiple enforcing statements, which are code clones of different types. Finally, a study with 16 professional developers indicates that the patterns we describe can be easily and accurately recognized in Java code.

This is a preview of subscription content, access via your institution.

Listing 1
Listing 2
Listing 3
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Availability of Data and Material

We make available the data set used in our empirical study, as well as the data derived from it. Our replication package includes: software documents corresponding to eight software systems, constraints extracted from the documents, constraint-to-code traces, training and coding protocol material, and our catalog of constraint implementation patterns (Florez et al. 2021).

Code Availability

The source code of our enforcing statement identification tool is also included in our replication package.

Notes

  1. https://groups.google.com/g/mozilla-rhino/c/bdEX2Wa3pSQ/m/QXizSSdGEwAJ

References

  • Ain QU, Butt WH, Anwar MW, Azam F, Maqbool B (2019) A systematic review on code clone detection. IEEE Access 7:86121–86144. https://doi.org/10.1109/ACCESS.2019.2918202

    Article  Google Scholar 

  • Ali N, Guéhéneuc YG, Antoniol G (2011) Trust-based requirements traceability. In: Proceedings of the 19th IEEE international conference on program comprehension (ICPC). https://doi.org/10.1109/ICPC.2011.42https://doi.org/10.1109/ICPC.2011.42, pp 111–120

  • Ali N, Sharafi Z, Guéhéneuc Y G, Antoniol G (2012) An empirical study on requirements traceability using eye-tracking. In: Proceedings of the 28th international conference on software maintenance (ICSM). https://doi.org/10.1109/ICSM.2012.6405271, pp 191–200

  • Ali N, Guéhéneuc Y G, Antoniol G (2013) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 39(5):725–741. https://doi.org/10.1109/TSE.2012.71

    Article  Google Scholar 

  • Alspaugh TA, Scacchi W (2013) Ongoing software development without classical requirements. In: Proceedings of the 21st IEEE international requirements engineering conference (RE). https://doi.org/10.1109/RE.2013.6636716, pp 165–174

  • Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983. https://doi.org/10.1109/TSE.2002.1041053

    Article  Google Scholar 

  • Apache Ant (2021) Targets. https://archive.apache.org/dist/ant/manual/apache-ant-1.10.6-manual.zip

  • Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd working conference on reverse engineering. https://doi.org/10.1109/WCRE.1995.514697, pp 86–95

  • Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591. https://doi.org/10.1109/TSE.2007.70725

    Article  Google Scholar 

  • Blasco D, Cetina C, Pastor Ó (2020) A fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case study. Inf Softw Technol 119:106235. https://doi.org/10.1016/j.infsof.2019.106235

    Article  Google Scholar 

  • Borg M, Runeson P, Ardö A (2014) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng 19(6):1565–1616. https://doi.org/10.1007/s10664-013-9255-y

    Article  Google Scholar 

  • Breaux T, Antón A (2008) Analyzing regulatory rules for privacy and security requirements. IEEE Trans Softw Eng 34(1):5–20. https://doi.org/10.1109/TSE.2007.70746

    Article  Google Scholar 

  • Business Rules Group (2003) The Business Rules Manifesto. https://www.businessrulesgroup.org/brmanifesto.htm

  • Cemus K, Cerny T, Donahoo MJ (2015) Evaluation of approaches to business rules maintenance in enterprise information systems. In: Proceedings of the 2015 conference on research in adaptive and convergent systems, RACS. https://doi.org/10.1145/2811411.2811476. Association for Computing Machinery, New York, pp 324–329

  • Cerny T, Donahoo MJ (2011) How to reduce costs of business logic maintenance. In: 2011 IEEE International conference on computer science and automation engineering. https://doi.org/10.1109/CSAE.2011.5953174, vol 1, pp 77–82

  • Chaparro O, Aponte J, Ortega F, Marcus A (2012) Towards the automatic extraction of structural business rules from legacy databases. In: 2012 19th Working conference on reverse engineering. https://doi.org/10.1109/WCRE.2012.57, pp 479–488

  • Cleland-Huang J, Gotel OCZ, Huffman Hayes J, Mäder P, Zisman A (2014a) Software traceability: trends and future directions. In: Proceedings of the on future of software engineering (FOSE 2014), FOSE 2014. https://doi.org/10.1145/2593882.2593891. ACM, New York, pp 55–69

  • Cleland-Huang J, Rahimi M, Mäder P (2014b) Achieving lightweight trustworthy traceability. In: Proceedings of the 22nd ACM SIGSOFT International symposium on foundations of software engineering, FSE 2014. https://doi.org/10.1145/2635868.2666612. Association for Computing Machinery, New York, pp 849–852

  • Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46

    Article  Google Scholar 

  • Cosentino V, Cabot J, Albert P, Bauquel P, Perronnet J (2012) A model driven reverse engineering framework for extracting business rules out of a java application. In: Bikakis A, Giurca A (eds) Rules on the web: research and applications. Lecture Notes in Computer Science. Springer, Berlin, pp 17–31

  • Cosentino V, Cabot J, Albert P, Bauquel P, Perronnet J (2013) Extracting business rules from COBOL: a model-based framework. In: Proceedings of the 20th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2013.6671316, pp 409–416

  • De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability. https://doi.org/10.1007/978-1-4471-2239-5_4. Springer, London, pp 71–98

  • Dömges R, Pohl K (1998) Adapting traceability environments to project-specific needs. Commun ACM 41(12):54–62. https://doi.org/10.1145/290133.290149

    Article  Google Scholar 

  • Dong J, Zhao Y (2007) Experiments on design pattern discovery. In: Proceedings of the 3rd international workshop on predictor models in software engineering (PROMISE). https://doi.org/10.1109/PROMISE.2007.6. IEEE Computer Society, Washington, DC, pp 12–12

  • Eaddy M, Aho AV, Antoniol G, Guéhéneuc yg (2008a) cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: Proceedings of the 16th IEEE international conference on program comprehension. https://doi.org/10.1109/ICPC.2008.39, pp 53–62

  • Eaddy M, Zimmermann T, Sherwood KD, Garg V, Murphy GC, Nagappan N, Aho AV (2008b) Do crosscutting concerns cause defects? IEEE Trans Softw Eng 34(4):497–515. https://doi.org/10.1109/TSE.2008.36

    Article  Google Scholar 

  • Fard AM, Mesbah A (2013) JSNOSE: detecting JavaScript code smells. In: 2013 IEEE 13th International working conference on source code analysis and manipulation (SCAM). https://doi.org/10.1109/SCAM.2013.6648192, pp 116–125

  • Florez JM, Moreno L, Zhang Z, Wei S, Marcus A (2022) An empirical study of data constraint implementations in Java (Replication Package). https://doi.org/10.5281/zenodo.6624695

  • Fowler M (2018) Refactoring: improving the design of existing code. Addison-Wesley Professional, Boston

    MATH  Google Scholar 

  • Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: Proceedings of the 30th international conference on software engineering, ICSE ’08. https://doi.org/10.1145/1368088.1368132. Association for Computing Machinery, New York, pp 321–330

  • Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co. Inc., Boston

    MATH  Google Scholar 

  • Google (2021a) Guava: Google core libraries for java. https://github.com/google/guava

  • Google (2021b) Guava: preconditions. https://github.com/google/guava/wiki/PreconditionsExplained#preconditions

  • Guéhéneuc YG, Antoniol G (2008) DeMIMA: a multilayered approach for design pattern identification. IEEE Trans Softw Eng 34(5):667–684. https://doi.org/10.1109/TSE.2008.48

    Article  Google Scholar 

  • Guéhéneuc Y G, Sahraoui HA, Zaidi F (2004) Fingerprinting design patterns. In: Proceedings of the 11th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2004.21, pp 172–181

  • Guo J, Zou Y (2008) Detecting clones in business applications. In: 2008 15th Working conference on reverse engineering. https://doi.org/10.1109/WCRE.2008.12, pp 91–100

  • Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th IEEE/ACM international conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2017.9, pp 3–14

  • Hatano T, Ishio T, Okada J, Sakata Y, Inoue K (2016) Dependency-based extraction of conditional statements for understanding business rules. IEICE Trans Inf Syst E99.D(4):1117–1126. https://doi.org/10.1587/transinf.2015EDP7202

    Article  Google Scholar 

  • Hay D, Healy KA (2000) Defining business rules \({\sim }\) what are they really?, rev 1.3 edn. Business Rule Group

  • HTTP Working Group (2021) Hypertext transfer protocol—HTTP/1.0. https://www.w3.org/Protocols/HTTP/1.0/draft-ietf-http-spec.html

  • Huang H, Tsai WT, Bhattacharya S, Chen X, Wang Y, Sun J (1996) Business rule extraction from legacy code. In: Proceedings of the 20th international computer software and applications conference (COMPSAC). https://doi.org/10.1109/CMPSAC.1996.544158, pp 162–167

  • iTrust (2021a) Chronic disease risks. See replication package

  • iTrust (2021b) UC51 enter/edit basic health metrics. See replication package

  • JavaParser (2021) JavaParser. https://javaparser.org/

  • jEdit (2021) Closing and exiting. http://www.jedit.org/users-guide/closing-exiting.html

  • Joda-Time (2021) GregorianJulian (GJ) calendar system. https://www.joda.org/joda-time/calgj.html

  • Kaczor O, Guéhéneuc YG, Hamel S (2006) Efficient identification of design patterns with bit-vector algorithm. In: Proceedings of the 10th European conference on software maintenance and reengineering (CSMR). https://doi.org/10.1109/CSMR.2006.25, pp 10–184

  • Kapser C, Godfrey MW (2006) “Cloning considered harmful” considered harmful. In: Proceedings of the 13th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2006.1, pp 19–28

  • Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Cousot P (ed) Static analysis. Lecture Notes in Computer Science. https://doi.org/10.1007/3-540-47764-0_3. Springer, Berlin, pp 40–56

  • Krippendorff K (2004) Content analysis: an introduction to its methodology. Sage, Thousand Oaks

    Google Scholar 

  • Kuang H, Nie J, Hu H, Rempel P, Lü J, Egyed A, Mäder P (2017) Analyzing closeness of code dependencies for improving IR-based traceability recovery. In: Proceedings of the 24th IEEE international conference on software analysis, evolution and reengineering (SANER). https://doi.org/10.1109/SANER.2017.7884610, pp 68–78

  • Larman C (2005) Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development. In: Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development, 3rd edn. PTR, 2005. Prentice Hall, Upper Saddle River

  • Livshits B, Sridharan M, Smaragdakis Y, Lhoták O, Amaral JN, Chang BYE, Guyer SZ, Khedker UP, Møller A, Vardoulakis D (2015) In defense of soundiness: a manifesto. Commun ACM 58(2):44–46. https://doi.org/10.1145/2644805

    Article  Google Scholar 

  • Mäder P, Jones PL, Zhang Y, Cleland-Huang J (2013) Strategic traceability for safety-critical projects. IEEE Softw 30(3):58–66. https://doi.org/10.1109/MS.2013.60

    Article  Google Scholar 

  • McMillan C, Poshyvanyk D, Revelle M (2009) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the 5th ICSE workshop on traceability in emerging forms of software engineering (TEFSE). https://doi.org/10.1109/TEFSE.2009.5069582. IEEE Computer Society, Washington, DC, pp 41–48

  • Miles MB, Huberman AM, Saldaña J (2014) Qualitative data analysis: a methods sourcebook, 3rd edn. SAGE Publications, Inc, Thousand Oaks

    Google Scholar 

  • Mirakhorli M, Cleland-Huang J (2016) Detecting, tracing, and monitoring architectural tactics in code. IEEE Trans Softw Eng 42(3):205–220. https://doi.org/10.1109/TSE.2015.2479217

    Article  Google Scholar 

  • Oualline S (1997) Practical C programming, 3rd edn. Nutshell Handbook. O’Reilly, Sebastopol

    Google Scholar 

  • Pandita R, Xiao X, Zhong H, Xie T, Oney S, Paradkar A (2012) Inferring method specifications from natural language API descriptions. In: 2012 34th International conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2012.6227137, pp 815–825

  • Park C, Kang Y, Wu C, Yi K (2004) A static reference flow analysis to understand design pattern behavior. In: Proceedings of the 11th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2004.9, pp 300–301

  • Rahimi M, Goss W, Cleland-Huang J (2016) Evolving requirements-to-code trace links across versions of a software system. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). https://doi.org/10.1109/ICSME.2016.57, pp 99–109

  • Razzaq A, Wasala A, Exton C, Buckley J (2018) The state of empirical evaluation in static feature location. Trans Softw Eng Methodol 28(1):2:1–2:58. https://doi.org/10.1145/3280988

    Google Scholar 

  • Rempel P, Mäder P, Kuschke T, Cleland-Huang J (2014) Mind the gap: assessing the conformance of software traceability to relevant guidelines. In: Proceedings of the 36th IEEE/ACM international conference on software engineering (ICSE), ICSE 2014. https://doi.org/10.1145/2568225.2568290. Association for Computing Machinery, Hyderabad, pp 943–954

  • Rhino (2021) ECMAScript language specification. https://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262%203rd%20edition,%20December%201999.pdf

  • Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74(7):470–495

    MathSciNet  Article  Google Scholar 

  • Saied MA, Sahraoui H, Dufour B (2015) An observational study on API usage constraints and their documentation. In: 2015 IEEE 22nd International conference on software analysis, evolution, and reengineering (SANER). https://doi.org/10.1109/SANER.2015.7081813, pp 33–42

  • Shi N, Olsson RA (2006) Reverse engineering of design patterns from Java source code. In: Proceedings of the 21st IEEE/ACM international conference on automated software engineering (ASE). https://doi.org/10.1109/ASE.2006.57, pp 123–134

  • Sneed HM (2001) Extracting business logic from existing COBOL programs as a basis for redevelopment. In: Proceedings of the 9th IEEE workshop on program comprehension (IWPC). https://doi.org/10.1109/WPC.2001.921728, pp 167–175

  • Sneed HM, Erdös K (1996) Extracting business rules from source code. In: Proceedings of the 4th IEEE workshop on program comprehension (WPC). https://doi.org/10.1109/WPC.1996.501138, Berlin, pp 240–247

  • Sultanov H, Hayes JH, Kong WK (2011) Application of swarm techniques to requirements tracing. Requir Eng 16(3):209–226. https://doi.org/10.1007/s00766-011-0121-4

    Article  Google Scholar 

  • Swarm (2021) Seismic wave analysis and real-time monitor: user manual and reference guide. Version 2.8.10. https://github.com/usgs/swarm/blob/97f8b2f26830c764b816ca0a74270d5c0db35d06/docs/swarm_v2.pdf

  • Syed M, Nelson SC (2015) Guidelines for establishing reliability when coding narrative data. Emerging Adulthood 3 (6):375–387. https://doi.org/10.1177/2167696815587648

    Article  Google Scholar 

  • Tan SH, Marinov D, Tan L, Leavens GT (2012) @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: Verification and validation 2012 IEEE fifth international conference on software testing. https://doi.org/10.1109/ICST.2012.106, pp 260–269

  • Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2010) An empirical study on the maintenance of source code clones. Empir Softw Eng 15 (1):1–34. https://doi.org/10.1007/s10664-009-9108-x

    Article  Google Scholar 

  • Tip F (1994) A survey of program slicing techniques. Tech. rep. CWI, Centre for Mathematics and Computer Science, NLD

  • Tsantalis N, Chatzigeorgiou A, Stephanides G, Halkidis ST (2006) Design pattern detection using similarity scoring. IEEE Trans Softw Eng 32(11):896–909. https://doi.org/10.1109/TSE.2006.112

    Article  Google Scholar 

  • WALA (2021) WALA: T.J. Watson Libraries for analysis. https://github.com/wala/WALA

  • Wan-Kadir WMN, Loucopoulos P (2004) Relating evolving business rules to software design. J Syst Archit 50(7):367–382. https://doi.org/10.1016/j.sysarc.2003.09.006

    Article  Google Scholar 

  • Wang X, Sun J, Yang X, He Z, Maddineni S (2004) Business rules extraction from large legacy systems. In: Proceedings of the 8th European conference on software maintenance and reengineering (CSMR). https://doi.org/10.1109/CSMR.2004.1281426, pp 249–258

  • Wiegers KE, Beatty J (2013) Software requirements, 3rd edn. Microsoft Press, Redmond

    Google Scholar 

  • Witt GC (2012) Writing effective business rules: a practical method. Morgan Kaufmann, Waltham

    Google Scholar 

  • Xiao X, Paradkar A, Thummalapenta S, Xie T (2012) Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE ’12. https://doi.org/10.1145/2393596.2393608. Association for Computing Machinery, New York, pp 1–11

  • Yang J, Sethi U, Yan C, Cheung A, Lu S (2020) Managing data constraints in database-backed web applications. In: Proceedings of the 42nd ACM/IEEE international conference on software engineering (ICSE), ICSE ’20. https://doi.org/10.1145/3377811.3380375. Association for Computing Machinery, New York, pp 1098–1109

  • Zhou Y, Gu R, Chen T, Huang Z, Panichella S, Gall H (2017) Analyzing APIs documentation and code to detect directive defects. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2017.11, pp 27–37

  • Zogaan W, Sharma P, Mirahkorli M, Arnaoudova V (2017) Datasets from fifteen years of automated requirements traceability research: current state, characteristics, and quality. In: Proceedings of the 25th IEEE international requirements engineering conference (RE). https://doi.org/10.1109/RE.2017.80, pp 110–121

Download references

Funding

This research was supported in part by grants from the National Science Foundation: CCF-1848608, CCF-1910976, CCF-1955837.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Juan Manuel Florez.

Ethics declarations

Conflict of Interests

There are no conflicts of interest or competing interests to disclose.

Additional information

Communicated by: Alexandre Bergel

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Constraint Implementation Patterns Catalog

Appendix: Constraint Implementation Patterns Catalog

Tables 910, and 11 contain 23 descriptions of the constraint implementation patterns from our catalog. The 7 most common patterns can be found on Table 4.

Table 9 CIP catalog, part 2
Table 10 CIP catalog, part 3
Table 11 CIP catalog, part 4

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Florez, J.M., Moreno, L., Zhang, Z. et al. An empirical study of data constraint implementations in Java. Empir Software Eng 27, 119 (2022). https://doi.org/10.1007/s10664-022-10175-w

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-022-10175-w

Keywords

  • Business rule
  • Data constraint
  • Empirical study
  • Code pattern
  • Discourse analysis