Using artificial neural networks to provide guidance in extending PL/SQL programs

Ersoy, Ersin; Sözer, Hasan

doi:10.1007/s11219-022-09586-1

Using artificial neural networks to provide guidance in extending PL/SQL programs

Published: 19 March 2022

Volume 30, pages 885–916, (2022)
Cite this article

Software Quality Journal Aims and scope Submit manuscript

259 Accesses
2 Altmetric
Explore all metrics

This article has been updated

Abstract

Extending legacy systems with new objects for contemporary functionality or technology can lead to architecture erosion. Misplacement of these objects gradually hampers the modular structure, of which documentation is usually missing or outdated. In this work, we aim at addressing this problem for PL/SQL programs, which are highly coupled with databases. We propose a novel approach that employs artificial neural networks to automatically predict the correct placement of a new object among architectural modules. We train a network based on features extracted from the initial version of the source code that is assumed to represent the intended architecture. We use dependencies among the software and database objects as features for this training. Then, given a new object and the list of other objects it uses, the network can predict the architectural module, where the object should be included. We performed two industrial case studies with applications from the telecommunications domain, each of which involves thousands of procedures and database tables. We showed that the accuracy of our approach is 86.7% and 89% for these two applications. The baseline approach that uses coupling and cohesion metrics reaches 55.5% and 57.4% accuracy for the same applications, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Applications of AI in classical software engineering

Article Open access 26 July 2020

Is Machine Learning Software Just Software: A Maintainability View

On Data Analysis of Software Repositories

Change history

16 April 2022
The original version of this article was updated to correct the article title.

Notes

References

Abdeen, H., Ducasse, S., Sahraoui, H., & Alloui, I. (2009). Automatic package coupling and cycle minimization. In: Proceedings of the 16th Working Conference on Reverse Engineering, pp. 103–112.
Altınışık, M., Ersoy, E., & Sözer, H. (2017). Evaluating software architecture erosion for PL/SQL programs. In: Proceedings of the 11th European Conference on Software Architecture: Companion Proceedings, pp. 159–165. ACM.
Altınışık, M., & Sözer, H. (2016). Automated procedure clustering for reverse engineering PL/SQL programs. In: Proceedings of the 31st ACM/SIGAPP Symposium on Applied Computing, pp. 1440–1445.
Bales, D. (2002). Java programming with Oracle JDBC. O’Reilly Media, Inc.
Bavota, G., Lucia, A. D., Marcus, A., & Oliveto, R (2013). Using structural and semantic measures to improve software modularization. Empirical Software Engineering 18(5), 901–932.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281–305.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM.
Callo, T., America, P., & Avgeriou, P. (2013). A top-down approach to construct execution views of a large software-intensive system. Journal of Software: Evolution and Process, 25(3), 233–260.
Google Scholar
Chaparro, O., Aponte, J., Ortega, F., & Marcus, A. (2012). Towards the automatic extraction of structural business rules from legacy databases. In: Proceedings of the 19th Working Conference on Reverse Engineering, pp. 479–488.
Chen, C., Alfayez, R., Srisopha, K., Boehm, B., & Shi, L. (2017). Why is it important to measure maintainability, and what are the best ways to do it? In: Proceedings of the 39th International Conference on Software Engineering Companion, p. 377–378.
Chester, D. L. (1990). Why two hidden layers are better than one. In: Proceedings of the international joint conference on neural networks, 1, pp. 265–268
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.
Google Scholar
Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), pp. 215–242
Deb, K. (2001). Multi-objective optimization using evolutionary algorithms, vol. 16. John Wiley & Sons.
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2), 182–197.
Article Google Scholar
Ducasse, S., & Pollet, D. (2009). Software architecture reconstruction: A process-oriented taxonomy. IEEE Transactions on Software Engineering, 35(4), 573–591.
Article Google Scholar
Ersoy, E., Kaya, K., Altınışık, M., & Sözer, H. (2016). Using hypergraph clustering for software architecture reconstruction of data-tier software. In: European Conference on Software Architecture, pp. 326–333. Springer
Fausett, L.V. (1994). Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice-Hall.
Garcia, J., Popescu, D., Edwards, G., & Medvidovic, N. (2009). Toward a catalogue of architectural bad smells. In: Proceedings of the International Conference on the Quality of Software Architectures, pp. 146–162.
Gardikiotis, S.K., Malevris, N., & Konstantinou, T. (2004). A structural approach towards the maintenance of database applications. In: Proceedings of the International Database Engineering and Applications Symposium, pp. 277–282.
Gelman, A., Jakulin, A., Pittau, M. G., Su, Y. S., et al. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.
Article MathSciNet MATH Google Scholar
Ghannem, A., Kessentini, M., Hamdi, M. S., & El Boussaidi, G. (2018). Model refactoring by example: A multi-objective search based software engineering approach. Journal of Software: Evolution and Process, 30(4), e1916.
Google Scholar
Gulesir, G. (2008). Evolvable behavior specifications using context-sensitive wildcards. Ph.D. thesis, University of Twente.
Guo, G., Atlee, J., & Kazman, R. (1999). A software architecture reconstruction method. In: Proceedings of the First Working Conference on Software Architecture, pp. 15–34. Deventer, The Netherlands, The Netherlands.
Habringer, M., Moser, M., & Pichler, J. (2014). Reverse engineering PL/SQL legacy code: An experience report. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, pp. 553–556.
Harrington, P. (2012). Machine learning in action. Shelter Island, NY: Manning Publications Co.
Google Scholar
Heiat, A. (2002). Comparison of artificial neural network and regression models for estimating software development effort. Information and software Technology, 44(15), 911–922.
Article Google Scholar
Henderson, K. (2000). The guru’s guide to Transact-SQL. Addison-Wesley Professional.
Ho, T. K. (1995). Random decision forests. In: Document analysis and recognition. Proceedings of the third international conference on 1, 278–282. IEEE.
Idri, A., Khoshgoftaar, T. M., & Abran, A. (2002). Can neural networks be easily interpreted in software cost estimation? In: Fuzzy Systems, 2002. FUZZ-IEEE’02. Proceedings of the 2002 IEEE International Conference on, 2, 1162–1167. IEEE.
Kessentini, M., Mansoor, U., Wimmer, M., Ouni, A., & Deb, K. (2017). Search-based detection of model level changes. Empirical Software Engineering, 22(2), 670–715.
Article Google Scholar
Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, 14, 1137–1145. Montreal, Canada.
Laser, M., Medvidovic, N., Le, D., & Garcia, J. (2020). ARCADE: an extensible workbench for architecture recovery, change, and decay evaluation. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1546–1550.
Lee, A., Cheng, C. H., & Balakrishnan, J. (1998). Software development cost estimation: integrating neural network with cluster analysis. Information & Management, 34(1), 1–9.
Article Google Scholar
Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., & Kroeger, R. (2018). Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Transactions on Software Engineering, 44(2), 159–181.
Article Google Scholar
Mansoor, U., Kessentini, M., Maxim, B. R., & Deb, K. (2017). Multi-objective code-smells detection using good and bad design examples. Software Quality Journal, 25(2), 529–552.
Article Google Scholar
Mitchell, B., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.
Article Google Scholar
Mkaouer, M. W., Kessentini, M., Cinnéide, M. Ó., Hayashi, S., & Deb, K. (2017). A robust multi-objective approach to balance severity and importance of refactoring opportunities. Empirical Software Engineering, 22(2), 894–927.
Article Google Scholar
Nelson, M. (2005). A survey of reverse engineering and program comprehension. CoRR abs/cs/0503068.
Ouni, A., Kula, R. G., Kessentini, M., Ishio, T., German, D. M., & Inoue, K. (2017). Search-based software library recommendation using multi-objective optimization. Information and Software Technology, 83, 55–75.
Article Google Scholar
Oracle Database. (2019). Online Documentation 11g Release developing and using stored procedures. http://docs.oracle.com/cd/B28359_01/appdev.111/b28843/tdddg_procedures.htm. Accessed in Oct 2019.
Panchal, G., Ganatra, A., Kosta, Y., & Panchal, D. (2011). Behaviour analysis of multilayer perceptronswith multiple hidden neurons and hidden layers. International Journal of Computer Theory and Engineering, 3(2), 332.
Article Google Scholar
Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules. Communications of the ACM, 15(12), 1053–1058.
Article Google Scholar
Patel, C., Hamou-Lhadj, A., & Rilling, J. (2008). Software clustering using dynamic analysis and static dependencies. In: Proceedings of the 13th European Conference on Software Maintenance and Reengineering, pp. 27–36.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V., et al. (2011). Scikit-learn. Machine learning in python. Journal of machine learning research 12, 2825–2830.
Praditwong, K., Harman, M., & Yao, X. (2011). Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering, 37(2), 264–282.
Article Google Scholar
Qingshan, L., et al. (2005). Architecture recovery and abstraction from the perspective of processes. In: WCRE, pp. 57–66.
Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.
Google Scholar
Schwanke, R., & Hanson, S. (1994). Using neural networks to modularize software. Machine Learning, 15(2), 137–168.
Article Google Scholar
Strniša, R., Sewell, P., & Parkinson, M. (2007). The Java module system: Core design and semantic definition. In: Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, p. 499–514.
Sun, C., Zhou, J., Cao, J., Jin, M., Liu, C., & Shen, Y. (2005). ReArchJBs: a tool for automated software architecture recovery of javabeans-based applications. In: Proceedings of the 16th Australian Software Engineering Conference, pp. 270–280.
Walker, S. H., & Duncan, D. B. (1967). Estimation of the probability of an event as a function of several independent variables. Biometrika, 54(1–2), 167–179.
Article MathSciNet MATH Google Scholar
Wawer, A., Nielek, R., & Wierzbicki, A. (2014). Predicting webpage credibility using linguistic features. In: Proceedings of the 23rd international conference on world wide web, pp. 1135–1140.
Wong, W. E., Debroy, V., Golden, R., Xu, X., & Thuraisingham, B. M. (2012). Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 61(1), 149–169.
Article Google Scholar
Wong, W. E., & Qi, Y. (2009). Bp neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 19(4), 573–597.
Article Google Scholar
Xiao, L., Cai, Y., & Kazman, R. (2014). Design rule spaces: A new form of architecture insight. In: Proceedings of the 36th International Conference on Software Engineering, pp. 967–977.
Zaccone, G., Karim, M. R., & Menshawy, A (2017). Deep Learning with TensorFlow. Packt Publishing Ltd.
Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International journal of forecasting, 14(1), 35–62.
Article Google Scholar

Download references

Acknowledgements

We would like to thank software developers at Turkcell for sharing their code base with us and supporting our case studies. We would also like to thank the anonymous reviewers, who helped us to improve the quality of this paper significantly.

Author information

Authors and Affiliations

Turkcell Group, Aydınevler Mah. İnönü Cad., Küçükyalı Ofispark, İstanbul, Turkey
Ersin Ersoy
Ozyegin University, İstanbul, Turkey
Hasan Sözer

Authors

Ersin Ersoy
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Sözer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ersin Ersoy.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Top ten accuracy values achieved with the datasets for CRM application CRM-T, CRM-TV, CRM-TVPF and CRM-ALL are depicted in Table 18. There are two parts in each cell of the Table 18 for each dataset. The first part shows the best hyperparameters for the top 10 results. The second part shows the results for each of these settings.

Table 18 Top 10 accuracy values obtained with CRM case study datasets, and the corresponding ANN hyperparameters

Full size table

Table 19 Top 10 accuracy values obtained with CMS case study datasets, and the corresponding ANN hyperparameters

Full size table

Hereby, the first row of the second part lists the accuracy scores. The second row of the second part list macro scores of precision, recall and F1. Finally, third row of the second part list weighted scores of precision, recall and F1. Also, top ten accuracy values achieved with the datasets for CMS application CMS-T, CMS-TVPF and CMS-ALL are depicted in Table 19.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ersoy, E., Sözer, H. Using artificial neural networks to provide guidance in extending PL/SQL programs. Software Qual J 30, 885–916 (2022). https://doi.org/10.1007/s11219-022-09586-1

Download citation

Accepted: 31 January 2022
Published: 19 March 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11219-022-09586-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using artificial neural networks to provide guidance in extending PL/SQL programs

Abstract

Access this article

Similar content being viewed by others

Applications of AI in classical software engineering

Is Machine Learning Software Just Software: A Maintainability View

On Data Analysis of Software Repositories

Change history

16 April 2022

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Using artificial neural networks to provide guidance in extending PL/SQL programs

Abstract

Access this article

Similar content being viewed by others

Applications of AI in classical software engineering

Is Machine Learning Software Just Software: A Maintainability View

On Data Analysis of Software Repositories

Change history

16 April 2022

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation