Skip to main content
Log in

Using artificial neural networks to provide guidance in extending PL/SQL programs

  • Published:
Software Quality Journal Aims and scope Submit manuscript

This article has been updated

Abstract

Extending legacy systems with new objects for contemporary functionality or technology can lead to architecture erosion. Misplacement of these objects gradually hampers the modular structure, of which documentation is usually missing or outdated. In this work, we aim at addressing this problem for PL/SQL programs, which are highly coupled with databases. We propose a novel approach that employs artificial neural networks to automatically predict the correct placement of a new object among architectural modules. We train a network based on features extracted from the initial version of the source code that is assumed to represent the intended architecture. We use dependencies among the software and database objects as features for this training. Then, given a new object and the list of other objects it uses, the network can predict the architectural module, where the object should be included. We performed two industrial case studies with applications from the telecommunications domain, each of which involves thousands of procedures and database tables. We showed that the accuracy of our approach is 86.7% and 89% for these two applications. The baseline approach that uses coupling and cohesion metrics reaches 55.5% and 57.4% accuracy for the same applications, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Change history

  • 16 April 2022

    The original version of this article was updated to correct the article title.

Notes

  1. http://www.oracle.com

  2. http://www.turkcell.com.tr

  3. https://scikit-learn.org/stable/modules/model_evaluation.html#accuracy-score

  4. https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification

  5. https://github.com/ersinersoy/annplsqlclassification

References

  • Abdeen, H., Ducasse, S., Sahraoui, H., & Alloui, I. (2009). Automatic package coupling and cycle minimization. In: Proceedings of the 16th Working Conference on Reverse Engineering, pp. 103–112.

  • Altınışık, M., Ersoy, E., & Sözer, H. (2017). Evaluating software architecture erosion for PL/SQL programs. In: Proceedings of the 11th European Conference on Software Architecture: Companion Proceedings, pp. 159–165. ACM.

  • Altınışık, M., & Sözer, H. (2016). Automated procedure clustering for reverse engineering PL/SQL programs. In: Proceedings of the 31st ACM/SIGAPP Symposium on Applied Computing, pp. 1440–1445.

  • Bales, D. (2002). Java programming with Oracle JDBC. O’Reilly Media, Inc.

  • Bavota, G., Lucia, A. D., Marcus, A., & Oliveto, R (2013). Using structural and semantic measures to improve software modularization. Empirical Software Engineering 18(5), 901–932.

  • Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research 13, 281–305.

  • Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on Computational learning theory, pp. 144–152. ACM.

  • Callo, T., America, P., & Avgeriou, P. (2013). A top-down approach to construct execution views of a large software-intensive system. Journal of Software: Evolution and Process, 25(3), 233–260.

    Google Scholar 

  • Chaparro, O., Aponte, J., Ortega, F., & Marcus, A. (2012). Towards the automatic extraction of structural business rules from legacy databases. In: Proceedings of the 19th Working Conference on Reverse Engineering, pp. 479–488.

  • Chen, C., Alfayez, R., Srisopha, K., Boehm, B., & Shi, L. (2017). Why is it important to measure maintainability, and what are the best ways to do it? In: Proceedings of the 39th International Conference on Software Engineering Companion, p. 377–378.

  • Chester, D. L. (1990). Why two hidden layers are better than one. In: Proceedings of the international joint conference on neural networks, 1,  pp. 265–268

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.

    Google Scholar 

  • Cox, D. R. (1958). The regression analysis of binary sequences. Journal of the Royal Statistical Society. Series B (Methodological), pp. 215–242

  • Deb, K. (2001). Multi-objective optimization using evolutionary algorithms, vol. 16. John Wiley & Sons.

  • Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE transactions on evolutionary computation, 6(2), 182–197.

    Article  Google Scholar 

  • Ducasse, S., & Pollet, D. (2009). Software architecture reconstruction: A process-oriented taxonomy. IEEE Transactions on Software Engineering, 35(4), 573–591.

    Article  Google Scholar 

  • Ersoy, E., Kaya, K., Altınışık, M., & Sözer, H. (2016). Using hypergraph clustering for software architecture reconstruction of data-tier software. In: European Conference on Software Architecture, pp. 326–333. Springer

  • Fausett, L.V. (1994). Fundamentals of Neural Networks: Architectures, Algorithms, and Applications. Prentice-Hall.

  • Garcia, J., Popescu, D., Edwards, G., & Medvidovic, N. (2009). Toward a catalogue of architectural bad smells. In: Proceedings of the International Conference on the Quality of Software Architectures, pp. 146–162.

  • Gardikiotis, S.K., Malevris, N., & Konstantinou, T. (2004). A structural approach towards the maintenance of database applications. In: Proceedings of the International Database Engineering and Applications Symposium, pp. 277–282.

  • Gelman, A., Jakulin, A., Pittau, M. G., Su, Y. S., et al. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.

    Article  MathSciNet  MATH  Google Scholar 

  • Ghannem, A., Kessentini, M., Hamdi, M. S., & El Boussaidi, G. (2018). Model refactoring by example: A multi-objective search based software engineering approach. Journal of Software: Evolution and Process, 30(4), e1916.

    Google Scholar 

  • Gulesir, G. (2008). Evolvable behavior specifications using context-sensitive wildcards. Ph.D. thesis, University of Twente.

  • Guo, G., Atlee, J., & Kazman, R. (1999). A software architecture reconstruction method. In: Proceedings of the First Working Conference on Software Architecture, pp. 15–34. Deventer, The Netherlands, The Netherlands.

  • Habringer, M., Moser, M., & Pichler, J. (2014). Reverse engineering PL/SQL legacy code: An experience report. In: Proceedings of the IEEE International Conference on Software Maintenance and Evolution, pp. 553–556.

  • Harrington, P. (2012). Machine learning in action. Shelter Island, NY: Manning Publications Co.

    Google Scholar 

  • Heiat, A. (2002). Comparison of artificial neural network and regression models for estimating software development effort. Information and software Technology, 44(15), 911–922.

    Article  Google Scholar 

  • Henderson, K. (2000). The guru’s guide to Transact-SQL. Addison-Wesley Professional.

  • Ho, T. K. (1995). Random decision forests. In: Document analysis and recognition. Proceedings of the third international conference on 1, 278–282. IEEE.

  • Idri, A., Khoshgoftaar, T. M., & Abran, A. (2002). Can neural networks be easily interpreted in software cost estimation? In: Fuzzy Systems, 2002. FUZZ-IEEE’02. Proceedings of the 2002 IEEE International Conference on, 2, 1162–1167. IEEE.

  • Kessentini, M., Mansoor, U., Wimmer, M., Ouni, A., & Deb, K. (2017). Search-based detection of model level changes. Empirical Software Engineering, 22(2), 670–715.

    Article  Google Scholar 

  • Kohavi, R., et al. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Ijcai, 14, 1137–1145. Montreal, Canada.

  • Laser, M., Medvidovic, N., Le, D., & Garcia, J. (2020). ARCADE: an extensible workbench for architecture recovery, change, and decay evaluation. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp. 1546–1550.

  • Lee, A., Cheng, C. H., & Balakrishnan, J. (1998). Software development cost estimation: integrating neural network with cluster analysis. Information & Management, 34(1), 1–9.

    Article  Google Scholar 

  • Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., & Kroeger, R. (2018). Measuring the impact of code dependencies on software architecture recovery techniques. IEEE Transactions on Software Engineering, 44(2), 159–181.

    Article  Google Scholar 

  • Mansoor, U., Kessentini, M., Maxim, B. R., & Deb, K. (2017). Multi-objective code-smells detection using good and bad design examples. Software Quality Journal, 25(2), 529–552.

    Article  Google Scholar 

  • Mitchell, B., & Mancoridis, S. (2006). On the automatic modularization of software systems using the bunch tool. IEEE Transactions on Software Engineering, 32(3), 193–208.

    Article  Google Scholar 

  • Mkaouer, M. W., Kessentini, M., Cinnéide, M. Ó., Hayashi, S., & Deb, K. (2017). A robust multi-objective approach to balance severity and importance of refactoring opportunities. Empirical Software Engineering, 22(2), 894–927.

    Article  Google Scholar 

  • Nelson, M. (2005). A survey of reverse engineering and program comprehension. CoRR abs/cs/0503068.

  • Ouni, A., Kula, R. G., Kessentini, M., Ishio, T., German, D. M., & Inoue, K. (2017). Search-based software library recommendation using multi-objective optimization. Information and Software Technology, 83, 55–75.

    Article  Google Scholar 

  • Oracle Database. (2019). Online Documentation 11g Release developing and using stored procedures. http://docs.oracle.com/cd/B28359_01/appdev.111/b28843/tdddg_procedures.htm. Accessed in Oct 2019.

  • Panchal, G., Ganatra, A., Kosta, Y., & Panchal, D. (2011). Behaviour analysis of multilayer perceptronswith multiple hidden neurons and hidden layers. International Journal of Computer Theory and Engineering, 3(2), 332.

    Article  Google Scholar 

  • Parnas, D. L. (1972). On the criteria to be used in decomposing systems into modules. Communications of the ACM, 15(12), 1053–1058.

    Article  Google Scholar 

  • Patel, C., Hamou-Lhadj, A., & Rilling, J. (2008). Software clustering using dynamic analysis and static dependencies. In: Proceedings of the 13th European Conference on Software Maintenance and Reengineering, pp. 27–36.

  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., & Dubourg, V., et al. (2011). Scikit-learn. Machine learning in python. Journal of machine learning research 12, 2825–2830.

  • Praditwong, K., Harman, M., & Yao, X. (2011). Software module clustering as a multi-objective search problem. IEEE Transactions on Software Engineering, 37(2), 264–282.

    Article  Google Scholar 

  •  Qingshan, L., et al. (2005). Architecture recovery and abstraction from the perspective of processes. In: WCRE, pp. 57–66.

  • Quinlan, J. R. (1986). Induction of decision trees. Machine learning, 1(1), 81–106.

    Google Scholar 

  • Schwanke, R., & Hanson, S. (1994). Using neural networks to modularize software. Machine Learning, 15(2), 137–168.

    Article  Google Scholar 

  • Strniša, R., Sewell, P., & Parkinson, M. (2007). The Java module system: Core design and semantic definition. In: Proceedings of the 22nd Annual ACM SIGPLAN Conference on Object-Oriented Programming Systems, Languages and Applications, p. 499–514.

  • Sun, C., Zhou, J., Cao, J., Jin, M., Liu, C., & Shen, Y. (2005). ReArchJBs: a tool for automated software architecture recovery of javabeans-based applications. In: Proceedings of the 16th Australian Software Engineering Conference, pp. 270–280.

  • Walker, S. H., & Duncan, D. B. (1967). Estimation of the probability of an event as a function of several independent variables. Biometrika, 54(1–2), 167–179.

    Article  MathSciNet  MATH  Google Scholar 

  • Wawer, A., Nielek, R., & Wierzbicki, A. (2014). Predicting webpage credibility using linguistic features. In: Proceedings of the 23rd international conference on world wide web, pp. 1135–1140.

  • Wong, W. E., Debroy, V., Golden, R., Xu, X., & Thuraisingham, B. M. (2012). Effective software fault localization using an RBF neural network. IEEE Transactions on Reliability, 61(1), 149–169.

    Article  Google Scholar 

  • Wong, W. E., & Qi, Y. (2009). Bp neural network-based effective fault localization. International Journal of Software Engineering and Knowledge Engineering, 19(4), 573–597.

    Article  Google Scholar 

  • Xiao, L., Cai, Y., & Kazman, R. (2014). Design rule spaces: A new form of architecture insight. In: Proceedings of the 36th International Conference on Software Engineering, pp. 967–977.

  • Zaccone, G., Karim, M. R., & Menshawy, A (2017). Deep Learning with TensorFlow. Packt Publishing Ltd.

  • Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural networks: The state of the art. International journal of forecasting, 14(1), 35–62.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank software developers at Turkcell for sharing their code base with us and supporting our case studies. We would also like to thank the anonymous reviewers, who helped us to improve the quality of this paper significantly.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ersin Ersoy.

Ethics declarations

Conflicts of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Top ten accuracy values achieved with the datasets for CRM application CRM-T, CRM-TV, CRM-TVPF and CRM-ALL are depicted in Table 18. There are two parts in each cell of the Table 18 for each dataset. The first part shows the best hyperparameters for the top 10 results. The second part shows the results for each of these settings.

Table 18 Top 10 accuracy values obtained with CRM case study datasets, and the corresponding ANN hyperparameters
Table 19 Top 10 accuracy values obtained with CMS case study datasets, and the corresponding ANN hyperparameters

Hereby, the first row of the second part lists the accuracy scores. The second row of the second part list macro scores of precision, recall and F1. Finally, third row of the second part list weighted scores of precision, recall and F1. Also, top ten accuracy values achieved with the datasets for CMS application CMS-T, CMS-TVPF and CMS-ALL are depicted in Table 19.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ersoy, E., Sözer, H. Using artificial neural networks to provide guidance in extending PL/SQL programs. Software Qual J 30, 885–916 (2022). https://doi.org/10.1007/s11219-022-09586-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11219-022-09586-1

Keywords

Navigation