Abstract
Software systems are designed according to guidelines and constraints defined by business rules. Some of these constraints define the allowable or required values for data handled by the systems. These data constraints usually originate from the problem domain (e.g., regulations), and developers must write code that enforces them. Understanding how data constraints are implemented is essential for testing, debugging, and software change. Unfortunately, there are no widely-accepted guidelines or best practices on how to implement data constraints. This paper presents an empirical study that investigates how data constraints are implemented in Java. We study the implementation of 187 data constraints extracted from the documentation of eight real-world Java software systems. First, we perform a qualitative analysis of the textual description of data constraints and identify four data constraint types. Second, we manually identify the implementations of these data constraints and reveal that they can be grouped into 31 implementation patterns. The analysis of these implementation patterns indicates that developers prefer a handful of patterns when implementing data constraints. We also found evidence suggesting that deviations from these patterns are associated with unusual implementation decisions or code smells. Third, we develop a tool-assisted protocol that allows us to identify 256 additional trace links for the data constraints implemented using the 13 most common patterns. We find that almost half of these data constraints have multiple enforcing statements, which are code clones of different types. Finally, a study with 16 professional developers indicates that the patterns we describe can be easily and accurately recognized in Java code.
Similar content being viewed by others
Availability of Data and Material
We make available the data set used in our empirical study, as well as the data derived from it. Our replication package includes: software documents corresponding to eight software systems, constraints extracted from the documents, constraint-to-code traces, training and coding protocol material, and our catalog of constraint implementation patterns (Florez et al. 2021).
Code Availability
The source code of our enforcing statement identification tool is also included in our replication package.
References
Ain QU, Butt WH, Anwar MW, Azam F, Maqbool B (2019) A systematic review on code clone detection. IEEE Access 7:86121–86144. https://doi.org/10.1109/ACCESS.2019.2918202
Ali N, Guéhéneuc YG, Antoniol G (2011) Trust-based requirements traceability. In: Proceedings of the 19th IEEE international conference on program comprehension (ICPC). https://doi.org/10.1109/ICPC.2011.42https://doi.org/10.1109/ICPC.2011.42, pp 111–120
Ali N, Sharafi Z, Guéhéneuc Y G, Antoniol G (2012) An empirical study on requirements traceability using eye-tracking. In: Proceedings of the 28th international conference on software maintenance (ICSM). https://doi.org/10.1109/ICSM.2012.6405271, pp 191–200
Ali N, Guéhéneuc Y G, Antoniol G (2013) Trustrace: mining software repositories to improve the accuracy of requirement traceability links. IEEE Trans Softw Eng 39(5):725–741. https://doi.org/10.1109/TSE.2012.71
Alspaugh TA, Scacchi W (2013) Ongoing software development without classical requirements. In: Proceedings of the 21st IEEE international requirements engineering conference (RE). https://doi.org/10.1109/RE.2013.6636716, pp 165–174
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E (2002) Recovering traceability links between code and documentation. IEEE Trans Softw Eng 28(10):970–983. https://doi.org/10.1109/TSE.2002.1041053
Apache Ant (2021) Targets. https://archive.apache.org/dist/ant/manual/apache-ant-1.10.6-manual.zip
Baker BS (1995) On finding duplication and near-duplication in large software systems. In: Proceedings of 2nd working conference on reverse engineering. https://doi.org/10.1109/WCRE.1995.514697, pp 86–95
Bellon S, Koschke R, Antoniol G, Krinke J, Merlo E (2007) Comparison and evaluation of clone detection tools. IEEE Trans Softw Eng 33(9):577–591. https://doi.org/10.1109/TSE.2007.70725
Blasco D, Cetina C, Pastor Ó (2020) A fine-grained requirement traceability evolutionary algorithm: Kromaia, a commercial video game case study. Inf Softw Technol 119:106235. https://doi.org/10.1016/j.infsof.2019.106235
Borg M, Runeson P, Ardö A (2014) Recovering from a decade: a systematic mapping of information retrieval approaches to software traceability. Empir Softw Eng 19(6):1565–1616. https://doi.org/10.1007/s10664-013-9255-y
Breaux T, Antón A (2008) Analyzing regulatory rules for privacy and security requirements. IEEE Trans Softw Eng 34(1):5–20. https://doi.org/10.1109/TSE.2007.70746
Business Rules Group (2003) The Business Rules Manifesto. https://www.businessrulesgroup.org/brmanifesto.htm
Cemus K, Cerny T, Donahoo MJ (2015) Evaluation of approaches to business rules maintenance in enterprise information systems. In: Proceedings of the 2015 conference on research in adaptive and convergent systems, RACS. https://doi.org/10.1145/2811411.2811476. Association for Computing Machinery, New York, pp 324–329
Cerny T, Donahoo MJ (2011) How to reduce costs of business logic maintenance. In: 2011 IEEE International conference on computer science and automation engineering. https://doi.org/10.1109/CSAE.2011.5953174, vol 1, pp 77–82
Chaparro O, Aponte J, Ortega F, Marcus A (2012) Towards the automatic extraction of structural business rules from legacy databases. In: 2012 19th Working conference on reverse engineering. https://doi.org/10.1109/WCRE.2012.57, pp 479–488
Cleland-Huang J, Gotel OCZ, Huffman Hayes J, Mäder P, Zisman A (2014a) Software traceability: trends and future directions. In: Proceedings of the on future of software engineering (FOSE 2014), FOSE 2014. https://doi.org/10.1145/2593882.2593891. ACM, New York, pp 55–69
Cleland-Huang J, Rahimi M, Mäder P (2014b) Achieving lightweight trustworthy traceability. In: Proceedings of the 22nd ACM SIGSOFT International symposium on foundations of software engineering, FSE 2014. https://doi.org/10.1145/2635868.2666612. Association for Computing Machinery, New York, pp 849–852
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Meas 20(1):37–46
Cosentino V, Cabot J, Albert P, Bauquel P, Perronnet J (2012) A model driven reverse engineering framework for extracting business rules out of a java application. In: Bikakis A, Giurca A (eds) Rules on the web: research and applications. Lecture Notes in Computer Science. Springer, Berlin, pp 17–31
Cosentino V, Cabot J, Albert P, Bauquel P, Perronnet J (2013) Extracting business rules from COBOL: a model-based framework. In: Proceedings of the 20th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2013.6671316, pp 409–416
De Lucia A, Marcus A, Oliveto R, Poshyvanyk D (2012) Information retrieval methods for automated traceability recovery. In: Cleland-Huang J, Gotel O, Zisman A (eds) Software and systems traceability. https://doi.org/10.1007/978-1-4471-2239-5_4. Springer, London, pp 71–98
Dömges R, Pohl K (1998) Adapting traceability environments to project-specific needs. Commun ACM 41(12):54–62. https://doi.org/10.1145/290133.290149
Dong J, Zhao Y (2007) Experiments on design pattern discovery. In: Proceedings of the 3rd international workshop on predictor models in software engineering (PROMISE). https://doi.org/10.1109/PROMISE.2007.6. IEEE Computer Society, Washington, DC, pp 12–12
Eaddy M, Aho AV, Antoniol G, Guéhéneuc yg (2008a) cerberus: tracing requirements to source code using information retrieval, dynamic analysis, and program analysis. In: Proceedings of the 16th IEEE international conference on program comprehension. https://doi.org/10.1109/ICPC.2008.39, pp 53–62
Eaddy M, Zimmermann T, Sherwood KD, Garg V, Murphy GC, Nagappan N, Aho AV (2008b) Do crosscutting concerns cause defects? IEEE Trans Softw Eng 34(4):497–515. https://doi.org/10.1109/TSE.2008.36
Fard AM, Mesbah A (2013) JSNOSE: detecting JavaScript code smells. In: 2013 IEEE 13th International working conference on source code analysis and manipulation (SCAM). https://doi.org/10.1109/SCAM.2013.6648192, pp 116–125
Florez JM, Moreno L, Zhang Z, Wei S, Marcus A (2022) An empirical study of data constraint implementations in Java (Replication Package). https://doi.org/10.5281/zenodo.6624695
Fowler M (2018) Refactoring: improving the design of existing code. Addison-Wesley Professional, Boston
Gabel M, Jiang L, Su Z (2008) Scalable detection of semantic clones. In: Proceedings of the 30th international conference on software engineering, ICSE ’08. https://doi.org/10.1145/1368088.1368132. Association for Computing Machinery, New York, pp 321–330
Gamma E, Helm R, Johnson R, Vlissides J (1995) Design patterns: elements of reusable object-oriented software. Addison-Wesley Longman Publishing Co. Inc., Boston
Google (2021a) Guava: Google core libraries for java. https://github.com/google/guava
Google (2021b) Guava: preconditions. https://github.com/google/guava/wiki/PreconditionsExplained#preconditions
Guéhéneuc YG, Antoniol G (2008) DeMIMA: a multilayered approach for design pattern identification. IEEE Trans Softw Eng 34(5):667–684. https://doi.org/10.1109/TSE.2008.48
Guéhéneuc Y G, Sahraoui HA, Zaidi F (2004) Fingerprinting design patterns. In: Proceedings of the 11th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2004.21, pp 172–181
Guo J, Zou Y (2008) Detecting clones in business applications. In: 2008 15th Working conference on reverse engineering. https://doi.org/10.1109/WCRE.2008.12, pp 91–100
Guo J, Cheng J, Cleland-Huang J (2017) Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th IEEE/ACM international conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2017.9, pp 3–14
Hatano T, Ishio T, Okada J, Sakata Y, Inoue K (2016) Dependency-based extraction of conditional statements for understanding business rules. IEICE Trans Inf Syst E99.D(4):1117–1126. https://doi.org/10.1587/transinf.2015EDP7202
Hay D, Healy KA (2000) Defining business rules \({\sim }\) what are they really?, rev 1.3 edn. Business Rule Group
HTTP Working Group (2021) Hypertext transfer protocol—HTTP/1.0. https://www.w3.org/Protocols/HTTP/1.0/draft-ietf-http-spec.html
Huang H, Tsai WT, Bhattacharya S, Chen X, Wang Y, Sun J (1996) Business rule extraction from legacy code. In: Proceedings of the 20th international computer software and applications conference (COMPSAC). https://doi.org/10.1109/CMPSAC.1996.544158, pp 162–167
iTrust (2021a) Chronic disease risks. See replication package
iTrust (2021b) UC51 enter/edit basic health metrics. See replication package
JavaParser (2021) JavaParser. https://javaparser.org/
jEdit (2021) Closing and exiting. http://www.jedit.org/users-guide/closing-exiting.html
Joda-Time (2021) GregorianJulian (GJ) calendar system. https://www.joda.org/joda-time/calgj.html
Kaczor O, Guéhéneuc YG, Hamel S (2006) Efficient identification of design patterns with bit-vector algorithm. In: Proceedings of the 10th European conference on software maintenance and reengineering (CSMR). https://doi.org/10.1109/CSMR.2006.25, pp 10–184
Kapser C, Godfrey MW (2006) “Cloning considered harmful” considered harmful. In: Proceedings of the 13th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2006.1, pp 19–28
Komondoor R, Horwitz S (2001) Using slicing to identify duplication in source code. In: Cousot P (ed) Static analysis. Lecture Notes in Computer Science. https://doi.org/10.1007/3-540-47764-0_3. Springer, Berlin, pp 40–56
Krippendorff K (2004) Content analysis: an introduction to its methodology. Sage, Thousand Oaks
Kuang H, Nie J, Hu H, Rempel P, Lü J, Egyed A, Mäder P (2017) Analyzing closeness of code dependencies for improving IR-based traceability recovery. In: Proceedings of the 24th IEEE international conference on software analysis, evolution and reengineering (SANER). https://doi.org/10.1109/SANER.2017.7884610, pp 68–78
Larman C (2005) Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development. In: Applying UML and patterns: an introduction to object-oriented analysis and design and iterative development, 3rd edn. PTR, 2005. Prentice Hall, Upper Saddle River
Livshits B, Sridharan M, Smaragdakis Y, Lhoták O, Amaral JN, Chang BYE, Guyer SZ, Khedker UP, Møller A, Vardoulakis D (2015) In defense of soundiness: a manifesto. Commun ACM 58(2):44–46. https://doi.org/10.1145/2644805
Mäder P, Jones PL, Zhang Y, Cleland-Huang J (2013) Strategic traceability for safety-critical projects. IEEE Softw 30(3):58–66. https://doi.org/10.1109/MS.2013.60
McMillan C, Poshyvanyk D, Revelle M (2009) Combining textual and structural analysis of software artifacts for traceability link recovery. In: Proceedings of the 5th ICSE workshop on traceability in emerging forms of software engineering (TEFSE). https://doi.org/10.1109/TEFSE.2009.5069582. IEEE Computer Society, Washington, DC, pp 41–48
Miles MB, Huberman AM, Saldaña J (2014) Qualitative data analysis: a methods sourcebook, 3rd edn. SAGE Publications, Inc, Thousand Oaks
Mirakhorli M, Cleland-Huang J (2016) Detecting, tracing, and monitoring architectural tactics in code. IEEE Trans Softw Eng 42(3):205–220. https://doi.org/10.1109/TSE.2015.2479217
Oualline S (1997) Practical C programming, 3rd edn. Nutshell Handbook. O’Reilly, Sebastopol
Pandita R, Xiao X, Zhong H, Xie T, Oney S, Paradkar A (2012) Inferring method specifications from natural language API descriptions. In: 2012 34th International conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2012.6227137, pp 815–825
Park C, Kang Y, Wu C, Yi K (2004) A static reference flow analysis to understand design pattern behavior. In: Proceedings of the 11th working conference on reverse engineering (WCRE). https://doi.org/10.1109/WCRE.2004.9, pp 300–301
Rahimi M, Goss W, Cleland-Huang J (2016) Evolving requirements-to-code trace links across versions of a software system. In: 2016 IEEE International conference on software maintenance and evolution (ICSME). https://doi.org/10.1109/ICSME.2016.57, pp 99–109
Razzaq A, Wasala A, Exton C, Buckley J (2018) The state of empirical evaluation in static feature location. Trans Softw Eng Methodol 28(1):2:1–2:58. https://doi.org/10.1145/3280988
Rempel P, Mäder P, Kuschke T, Cleland-Huang J (2014) Mind the gap: assessing the conformance of software traceability to relevant guidelines. In: Proceedings of the 36th IEEE/ACM international conference on software engineering (ICSE), ICSE 2014. https://doi.org/10.1145/2568225.2568290. Association for Computing Machinery, Hyderabad, pp 943–954
Rhino (2021) ECMAScript language specification. https://www.ecma-international.org/publications/files/ECMA-ST-ARCH/ECMA-262%203rd%20edition,%20December%201999.pdf
Roy CK, Cordy JR, Koschke R (2009) Comparison and evaluation of code clone detection techniques and tools: a qualitative approach. Sci Comput Program 74(7):470–495
Saied MA, Sahraoui H, Dufour B (2015) An observational study on API usage constraints and their documentation. In: 2015 IEEE 22nd International conference on software analysis, evolution, and reengineering (SANER). https://doi.org/10.1109/SANER.2015.7081813, pp 33–42
Shi N, Olsson RA (2006) Reverse engineering of design patterns from Java source code. In: Proceedings of the 21st IEEE/ACM international conference on automated software engineering (ASE). https://doi.org/10.1109/ASE.2006.57, pp 123–134
Sneed HM (2001) Extracting business logic from existing COBOL programs as a basis for redevelopment. In: Proceedings of the 9th IEEE workshop on program comprehension (IWPC). https://doi.org/10.1109/WPC.2001.921728, pp 167–175
Sneed HM, Erdös K (1996) Extracting business rules from source code. In: Proceedings of the 4th IEEE workshop on program comprehension (WPC). https://doi.org/10.1109/WPC.1996.501138, Berlin, pp 240–247
Sultanov H, Hayes JH, Kong WK (2011) Application of swarm techniques to requirements tracing. Requir Eng 16(3):209–226. https://doi.org/10.1007/s00766-011-0121-4
Swarm (2021) Seismic wave analysis and real-time monitor: user manual and reference guide. Version 2.8.10. https://github.com/usgs/swarm/blob/97f8b2f26830c764b816ca0a74270d5c0db35d06/docs/swarm_v2.pdf
Syed M, Nelson SC (2015) Guidelines for establishing reliability when coding narrative data. Emerging Adulthood 3 (6):375–387. https://doi.org/10.1177/2167696815587648
Tan SH, Marinov D, Tan L, Leavens GT (2012) @tComment: testing Javadoc comments to detect comment-code inconsistencies. In: Verification and validation 2012 IEEE fifth international conference on software testing. https://doi.org/10.1109/ICST.2012.106, pp 260–269
Thummalapenta S, Cerulo L, Aversano L, Di Penta M (2010) An empirical study on the maintenance of source code clones. Empir Softw Eng 15 (1):1–34. https://doi.org/10.1007/s10664-009-9108-x
Tip F (1994) A survey of program slicing techniques. Tech. rep. CWI, Centre for Mathematics and Computer Science, NLD
Tsantalis N, Chatzigeorgiou A, Stephanides G, Halkidis ST (2006) Design pattern detection using similarity scoring. IEEE Trans Softw Eng 32(11):896–909. https://doi.org/10.1109/TSE.2006.112
WALA (2021) WALA: T.J. Watson Libraries for analysis. https://github.com/wala/WALA
Wan-Kadir WMN, Loucopoulos P (2004) Relating evolving business rules to software design. J Syst Archit 50(7):367–382. https://doi.org/10.1016/j.sysarc.2003.09.006
Wang X, Sun J, Yang X, He Z, Maddineni S (2004) Business rules extraction from large legacy systems. In: Proceedings of the 8th European conference on software maintenance and reengineering (CSMR). https://doi.org/10.1109/CSMR.2004.1281426, pp 249–258
Wiegers KE, Beatty J (2013) Software requirements, 3rd edn. Microsoft Press, Redmond
Witt GC (2012) Writing effective business rules: a practical method. Morgan Kaufmann, Waltham
Xiao X, Paradkar A, Thummalapenta S, Xie T (2012) Automated extraction of security policies from natural-language software documents. In: Proceedings of the ACM SIGSOFT 20th international symposium on the foundations of software engineering, FSE ’12. https://doi.org/10.1145/2393596.2393608. Association for Computing Machinery, New York, pp 1–11
Yang J, Sethi U, Yan C, Cheung A, Lu S (2020) Managing data constraints in database-backed web applications. In: Proceedings of the 42nd ACM/IEEE international conference on software engineering (ICSE), ICSE ’20. https://doi.org/10.1145/3377811.3380375. Association for Computing Machinery, New York, pp 1098–1109
Zhou Y, Gu R, Chen T, Huang Z, Panichella S, Gall H (2017) Analyzing APIs documentation and code to detect directive defects. In: 2017 IEEE/ACM 39th international conference on software engineering (ICSE). https://doi.org/10.1109/ICSE.2017.11, pp 27–37
Zogaan W, Sharma P, Mirahkorli M, Arnaoudova V (2017) Datasets from fifteen years of automated requirements traceability research: current state, characteristics, and quality. In: Proceedings of the 25th IEEE international requirements engineering conference (RE). https://doi.org/10.1109/RE.2017.80, pp 110–121
Funding
This research was supported in part by grants from the National Science Foundation: CCF-1848608, CCF-1910976, CCF-1955837.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
There are no conflicts of interest or competing interests to disclose.
Additional information
Communicated by: Alexandre Bergel
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Florez, J.M., Moreno, L., Zhang, Z. et al. An empirical study of data constraint implementations in Java. Empir Software Eng 27, 119 (2022). https://doi.org/10.1007/s10664-022-10175-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-022-10175-w