Empirical Software Engineering

, Volume 23, Issue 2, pp 570–644 | Cite as

Toward the development of a conventional time series based web error forecasting framework

Article

Abstract

Web reliability is gaining importance with time due to the exponential increase in the popularity of different social community networks, mailing systems and other online applications. Hence, to enhance the reliability of any existing web system, the web administrators must have the knowledge of various web errors present in the system, influences of various workload characteristics on the manifestation of several web errors and the relations among different workload characteristics. But in reality, often it may not be possible to institute a generalized correspondence among several workload characteristics. Moreover, the issues like the prediction and estimation of the cumulative occurrences of the source content failures and the corresponding time between failures of a web system become less highlighted by the reliability research community. Hence, in this work, the authors have presented a well-defined procedure (a forecasting framework) for the web admins to analyze and enhance the reliability of the web sites under their supervision. Initially, it takes the HTTP access and the error logs to extract all the necessary information related to the workloads, web errors and corresponding time between failures. Next, we have performed the principal component analysis, correlation analysis and the change point analysis to select the number of independent variables. Next, we have developed various time series based forecasting models for foretelling the cumulative occurrences of the source content failures and the corresponding time between failures. In the current work, the multivariate models also include various uncorrelated workloads, the exogeneous and the endogenous noises for forecasting the web errors and the corresponding time between failures. The proposed methodology has been validated with usage statistics collected from the web sites belong of two highly renowned Indian academic institutions.

Keywords

Web Software Reliability Univariate Time Series Multivariate Time Series Web Server HTTP logs, Forecasting 

Notes

Acknowledgements

The authors are thankful to the National University of Singapore, Singapore University of Technology and Design (collaborated with the MIT, USA) and The State University of New Jersey, Rutgers for providing excellent environment for completing this work. The constructive comments of the extremely learned associate editor and three enlightened anonymous reviewers are also gratefully acknowledged. The authors would like to express their heartfelt gratitude to Prof. Subhashis Chatterjee, Mr. Rajesh Mishra (Indian Institute of Technology Dhanbad), Prof. Amitava Dutta, Mr. Subhashis Kumar Pal, Mr. Ashish Biswas (Indian Statistical Institute) for providing the necessary data and valuable ideas.

References

  1. Akaike H (1974) A new look at the statistical model identification. IEEE Trans Automat Control 19:716–723MathSciNetCrossRefMATHGoogle Scholar
  2. Amin A, Grunske L, Colman A (2013) An approach to software reliability prediction based on time series modeling. Journal of Systems and Software 86(7):1923–1932CrossRefGoogle Scholar
  3. Anselmo V, Ubertini L (1979) Transfer function-noise model applied to flow forecasting. Hydrol Sci 24:353–359CrossRefGoogle Scholar
  4. Arlitt MF, Williamson CL (1997) Internet Web Servers: Workload Characterization and Performance Implications. IEEE/ACM Trans. Networking 5:631–645CrossRefGoogle Scholar
  5. Armstrong JS (1985) Long-Range Forecasting. Wiley, New YorkGoogle Scholar
  6. Armstrong S, Collopy F (1992) Error Measures for Generalizing About Forecasting Methods: Empirical Comparisons. International Journal of Forecasting 8:69–80CrossRefGoogle Scholar
  7. Bai C, Hu Q, Xie M, Ng S (2005) Software failure prediction based on a Markov Bayesian network model. Journal of Systems and Software 74(3):275–282CrossRefGoogle Scholar
  8. Barghout M, Littlewood B, Abdel-Ghaly A (1998) A non-parametric order statistics software reliability model. Software Testing, Verification and Reliability 8(3):113–132CrossRefGoogle Scholar
  9. Bishop C (1991) Improving the generalization properties of radial basis function neural networks. Neural computation 3(4):579–588CrossRefGoogle Scholar
  10. Bontempi G, Taieb SB, Le Borgne YA (2013) Machine learning strategies for time series forecasting. Business Intelligence, Springer Berlin, Heidelberg, pp 62–77Google Scholar
  11. Box GPE, Jenkins GM (1976) Time series analysis, forecasting, and control. Holden-Day, San FranciscoMATHGoogle Scholar
  12. Breiman L (2001) Statistical modeling: the two cultures. Statistical Science 16:199–231MathSciNetCrossRefMATHGoogle Scholar
  13. Broomhead DS, Lowe D (1988) Radial basis functions, multi-variable functional interpolation and adaptive networks (No. RSRE-MEMO-4148). Royal Signals and Radar Establishment Malvern, United KingdomGoogle Scholar
  14. Catledge LD, Pitkow JE (1995) Characterizing browsing strategies in the World-Wide Web. Computer Networks and ISDN Systems 27:1065–1073CrossRefGoogle Scholar
  15. Chatfield C (1988) Apples, oranges and mean square error. International Journal of Forecasting 4:515–518CrossRefGoogle Scholar
  16. Chatterjee S, Roy A (2014) Transfer Function Modeling in Web Software Fault Prediction Implementing Pre-Whitening Technique. International Journal of Reliability Quality and Safety Engineering 21:5CrossRefGoogle Scholar
  17. Chatterjee S, Roy A (2015) Novel algorithms for web software fault prediction. Quality and Reliability Engineering International 31(8):1517–1535CrossRefGoogle Scholar
  18. Chatterjee S, Misra RB, Alam SS (1997a) Joint effect of test effort and learning factor on software reliability and optimal release policy. International Journal of System Science 28:391–396CrossRefMATHGoogle Scholar
  19. Chatterjee S, Misra RB, Alam SS (1997b) Prediction of software reliability using an auto regressive process. International Journal of System Science 28:205–211MATHGoogle Scholar
  20. Chatterjee S, Nigam S, Singh JB, Upadhayaya LN (2011a) Transfer function modeling in software reliability. Computing 92:33–48MathSciNetCrossRefMATHGoogle Scholar
  21. Chatterjee S, Singh JB, Nigam S, Upadhayaya LN (2011b) Best subset selection of ARMA and ARIMA models for software reliability estimation. International journal of Modeling and Simulation 31(2):120–125Google Scholar
  22. Chatterjee S, Singh JB, Roy A (2015) A structure-based software reliability allocation using fuzzy analytic hierarchy process. Int J Syst Sci 46(3):513–525Google Scholar
  23. Chatterjee S, Nigam S, Roy A (2016) Software fault prediction using neuro-fuzzy network and evolutionary learning approach. Neural Computing and Applications. doi: 10.1007/s00521-016-2437-y
  24. Chen SM (1996) Forecasting enrollments based on fuzzy time series. Fuzzy Sets Syst 81(3):311–319CrossRefGoogle Scholar
  25. Chen SM, Chung NY (2006) Forecasting enrolments using high order fuzzy time series and genetic algorithms. Int J Intell Syst 21:485–501Google Scholar
  26. Chen SM, Tanuwijaya K (2011) Multivariate fuzzy forecasting based on fuzzy time series and automatic clustering techniques. Expert Syst with Appl 38:10594–10650CrossRefGoogle Scholar
  27. Cheng B, Titterington DM (1994) Neural networks: a review from a statistical perspective. Statistical Science 9:2–54MathSciNetCrossRefMATHGoogle Scholar
  28. Csermely P (2009) Weak links. SpringerGoogle Scholar
  29. Davari S, Zarandi MHF, Turksen IB (2009) An Improved fuzzy time series forecasting model based on particle swarm intervalization. The 28th North American Fuzzy Information Processing Society Annual Conferences (NAFIPS 2009), Cincinnati, June 14-17Google Scholar
  30. Eğrioglu E, Aladag CH, Yolcu U, Uslu VR, Basaran MA (2010) Finding an optimal interval length in high order fuzzy time series. Expert Systems with Applications 37:5052–5055CrossRefGoogle Scholar
  31. Eğrioglu E, Aladag CH, Başaran MA, Uslu VR, Yolcu U (2011) A New Approach Based on the Optimization of the Length of Intervals in Fuzzy Time Series. Journal of Intelligent and Fuzzy Systems 22:15–19MATHGoogle Scholar
  32. Espinha T, Zaidman A, Gross HG (2015) Web API growing pains: Loosely coupled yet strongly tied. Journal of Systems and Software 100:27–43CrossRefGoogle Scholar
  33. Eubank RL (1988) Spline Smoothing and Nonparametric Regression of Statistics, Textbooks and Monographs, vol. 90. Marcel DekkerGoogle Scholar
  34. Falát L (2016) Time Series Forecasting with Hybrid Neural Networks and Advanced Statistical Methods. Information Sciences and Technologies 8(1):33Google Scholar
  35. Fenton N, Neil M, Marquez D (2008) Using Bayesian networks to predict software defects and reliability. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 222(4):701–712Google Scholar
  36. Goel A, Okumoto K (1979) A time-dependent error-detection rate model for software reliability and other performance measures. IEEE Transactions on Reliability 28(3):206–211CrossRefMATHGoogle Scholar
  37. Hand DJ (2000) Data mining, new challenges for statisticians. Social Science Computer Review 18(4):442–449CrossRefGoogle Scholar
  38. Haykin S (1999). Neural Networks: A Comprehensive Foundation, (2nd ed.). Prentice Hall, Upper Saddle River. ISBN 0-13-908385-5Google Scholar
  39. Hsu LY, Horng SJ, Kao TW, Chen YH, Run RS, Chen RJ, Lai JL, Kuo IH (2010) Temperature prediction and TAIFEX forecasting based on fuzzy relationships and MTPSO techniques. Expert Systems with application 37:2756–2770CrossRefGoogle Scholar
  40. Huang YL, Horng SJ, Kao TW, Run RS, Lai JL, Chen RJ, Kuo IH (2011) An improved forecasting model based on the weighted fuzzy relationship matrix combined with a pso adaptation for enrollments. Journal of Innovative Computing, Information and Control 7:4027–4045Google Scholar
  41. Huarng K (2001) Effective length of intervals to improve forecasting in fuzzy timeSeries. Fuzzy Sets and Systems 123:387–394MathSciNetCrossRefMATHGoogle Scholar
  42. Huynh T, Miller J (2009) Another viewpoint on evaluating web software reliability based on workload and failure data extracted from server logs. Empirical Software Engineering 2009(14):371–396CrossRefGoogle Scholar
  43. Jo T (2013) VTG schemes for using back propagation for multivariate time series prediction. Applied Soft Computing 13:2692–2720CrossRefGoogle Scholar
  44. Jolliffee IT (1986) Principal component analysis. Springer, New YorkCrossRefGoogle Scholar
  45. Junhong G, Hongwei L, Xiaozong Y (2005) An autoregressive time series software reliability growth model with independent increment. Proceedings of the International Conference on Mathematical Methods and Computational Techniques In Electrical Engineering, WSEAS, pp 362–366Google Scholar
  46. Kallepalli C, Tian J (2001) Measuring and Modeling Usage and Reliability for Statistical Web Testing. IEEE Trans. On Software Engineering 27:1023–1036CrossRefGoogle Scholar
  47. Karlaftis MG, Vlahogianni EI (2011) Statistical methods versus neural networks in transportation research: Differences, similarities and some insights. Transportation Research Part C: Emerging Technologies 19(3):387–399CrossRefGoogle Scholar
  48. Keivanloo I, Rilling J (2014) Software trustworthiness 2.0— A semantic web enabled global source code analysis approach. Journal of Systems and Software 89:33–50CrossRefGoogle Scholar
  49. Kini BV, Chandra Sekhar C (2013) Large margin mixture of AR models for time series classification. Applied Soft Computing 13:361–371CrossRefGoogle Scholar
  50. Kuo IH, Horng SJ, Kao TW, Lin TL, Lee CL, Pan Y (2009) An improved method for forecasting enrollments based on fuzzy time series and particle swarm optimization. Expert Systems with Application 36:6108–6117CrossRefGoogle Scholar
  51. Kuo IH, Horng SJ, Chen YH, Run RS, Kao TW, Chen RJ, Lai JL, Lin TL (2010) Forecasting TAIFEX based on fuzzy time series and particle swarm optimization. Expert Systems with Application 37:1494–1502CrossRefGoogle Scholar
  52. Lai PW (1979) Transfer function modeling relationship between time series variables. Mid Anglia LithoGoogle Scholar
  53. Lapedes A, Farber R (1987) Nonlinear signal processing using neural networks: prediction and system modelling. Technical Report LA-UR-87-2662, Los Alamos National Laboratory, Los AlamosGoogle Scholar
  54. Lee LW, Wang LH, Chen SM (2007) Temperature prediction and TAIFEX forecasting based on fuzzy logical relationships and genetic algorithms. Expert Systems with Applications 33:539–550CrossRefGoogle Scholar
  55. Lee LW, Wang LH, Chen SM (2008) Temperature prediction and TAIFEX forecasting based on high-order fuzzy logical relationships and genetic simulated annealing techniques. Expert Systems with Applications 34:328–336CrossRefGoogle Scholar
  56. Lund R, Wang XI, Lu QQ, Reeves J, Gallagher C, Feng Y (2007) Change point Detection in Periodic and Autocorrelated Time Series. J. Climate 20:5178–5190CrossRefGoogle Scholar
  57. Lutkepohl H (2005) New introduction to multiple time series analysis. Springer, BerlinCrossRefMATHGoogle Scholar
  58. Lyu MR (1996) Handbook of Software Reliability Engineering. IEEE Computer Society Press, McGraw Hill, New YorkGoogle Scholar
  59. Lyu M, Nikora A (1992) Applying reliability models more effectively. IEEE Software 9(4):43–52CrossRefGoogle Scholar
  60. Ma L, Tian J (2007) Web Error Classification and Analysis for Reliability Improvement. J Syst Softw 80(6):795–804Google Scholar
  61. Martínez Y, Cachero C, Meliá S (2014) Empirical study on the maintainability of Web applications: Model-driven Engineering vs Code-centric. Empirical Software Engineering 19:1887–1920. doi: 10.1007/s10664-013-9269-5 CrossRefGoogle Scholar
  62. Maurer C, Peterka JR (2005) A new interpretation of spontaneous sway measures based on a simple model of human postural control. Journal of Neurophysiology 93(1):189–200CrossRefGoogle Scholar
  63. Moura M, Zio E, Didier Lins I, Droguett E (2011) Failure and reliability prediction by support vector machines regression of time series data. Reliability Engineering and System Safety 96(11):1527–1534CrossRefGoogle Scholar
  64. Musa JD, Iannino A, Okumoto K (1987) Software Reliability Measurement, Prediction, Application, Int. Ed. McGraw-HillGoogle Scholar
  65. Offutt J (2002) Quality Attributes of Web Software Applications. IEEE Software 2002(19):25–32CrossRefGoogle Scholar
  66. Offutt J, Papadimitriou V, Praphamontripong U (2014) A case study on bypass testing of web applications. Empirical Software Engineering 19:69–104. doi: 10.1007/s10664-012-9216-x CrossRefGoogle Scholar
  67. Park JH (2013) Multiple-index approach to multiple autoregressive time series model. Statistics and Computing 23:201–208MathSciNetCrossRefMATHGoogle Scholar
  68. Park JI, Lee DJ, Song CK, Chun MG (2010) TAIFEX and KOSPI 200 forecasting based on two factors high order fuzzy time series and particle swarm optimization. Expert Systems with Application 37:959–967CrossRefGoogle Scholar
  69. Pearson J, Pearson A, Green D (2007) Determining the Importance of Key Criteria in Web Usability. Management Research News 30(11):816–828CrossRefGoogle Scholar
  70. Pena D, Sanchez I (2007) Measuring the Advantages of Multivariate vs. Univariate Forecasts. Journal of Time Series Analysis 28:6MathSciNetCrossRefMATHGoogle Scholar
  71. Pham H (1995) Software Reliability and testing. Wiley-IEEE Computer Society Press, ISBN:978-0-8186-6852-4Google Scholar
  72. Pham H (2006) System Software Reliability. Springer-Verlag, LondonCrossRefGoogle Scholar
  73. Popstojanova KS, Singh AD, Mazimdar S, Li F (2006) Empirical Characterization of Session-Based Workload and Reliability for Web Servers. Empire Software Eng. 11:71–117CrossRefGoogle Scholar
  74. Qu L, Chen Y, Liu Z (2006, June) Time series forecasting model with error correction by structure adaptive RBF neural network. In Intelligent Control and Automation, 2006. WCICA 2006. The Sixth World Congress on, IEEE, vol. 2, pp 6831–6835Google Scholar
  75. Robinson D, Dietrich D (1987) A new nonparametric growth model. IEEE Transactions on Reliability 36(4):411–418CrossRefMATHGoogle Scholar
  76. Roy A (2016) A novel multivariate fuzzy time series based forecasting algorithm incorporating the effect of clustering on prediction. Soft Computing 20(5):1991–2019CrossRefGoogle Scholar
  77. Ruggieri E (2013) A Bayesian approach to detecting change points in climatic records. International Journal of Climatology 33(2):520–528CrossRefGoogle Scholar
  78. Schneidewind NF (2012) Computer, Network, Software, and Hardware Engineering with Application. WileyGoogle Scholar
  79. Shao J (1997) An asymptotic theory for linear model selection (with discussion). Statistica Sinica 7:221–242MathSciNetMATHGoogle Scholar
  80. Sharma K, Garg R, Nagpal C, Garg R (2010) Selection of optimal software reliability growth models using a distance based approach. IEEE Transactions on Reliability 59(2):266–276CrossRefGoogle Scholar
  81. Shumway HR, Stoffer SD (2008) Time series analysis and its applications. SpringerGoogle Scholar
  82. Singpurwalla ND (1980) Analyzing availability using transfer function models and cross spectral analysis. Naval Research Logist Quart 27:1–16CrossRefMATHGoogle Scholar
  83. Singpurwalla ND, Soyer R (1985) Assessing (Software) Reliability Growth Using a Random Coefficient Autoregressive Process and Its Ramifications. IEEE Trans Softw Eng 11(12):1456–1464Google Scholar
  84. Song Q, Chissom BS (1993) Forecasting enrollments with fuzzy time series—part I. Fuzzy Sets Syst 54(1):1–9. doi: 10.1016/0165-0114(93)90355-L CrossRefGoogle Scholar
  85. Song Q, Chissom BS (1994) Forecasting enrollments with fuzzy time series—part II. Fuzzy Sets Syst 62:1–8. doi: 10.1016/0165-0114(94)90067-1 CrossRefGoogle Scholar
  86. Sosinsky B (2009) Networking Bible. WileyGoogle Scholar
  87. Suparta W, Alhasa KM (2013) A comparison of ANFIS and MLP models for the prediction of precipitable water vapor. 2013 I.E. international conference on space science and communication (IconSpace), pp 243–248Google Scholar
  88. Tanenbaum AS (2011) Computer Networks. Pearson, IndiaMATHGoogle Scholar
  89. Tarafdar M, Zhang J (2005) Analyzing the Influence of Website Design Parameters on Website Usability. Information Resources Management Journal 18(4):62–80CrossRefGoogle Scholar
  90. Tian J (2002) Better Reliability Assessment and Prediction through Data Clustering. IEEE Trans. On Software Engineering 28:997–1007CrossRefGoogle Scholar
  91. Tian J, Rudraraju S, Li Z (2004) Evaluating Web Software Reliability Based on Workload and Failure Data Extracted from Server Logs. IEEE Trans Softw Eng 30(11):754–769Google Scholar
  92. Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. Journal of clinical epidemiology 49(11):1225–1231CrossRefGoogle Scholar
  93. Walls LA, Bendell A (1987) Time series methods in reliability. Reliab. Eng. & Syst. Safety 18:239–265CrossRefGoogle Scholar
  94. Werbos PJ (1974) Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. PhD thesis, Harvard University, CambridgeGoogle Scholar
  95. Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clin Res 30(1):79–82Google Scholar
  96. Wiper M, Palacios A, Marín J (2012) Bayesian software reliability prediction using software metrics information. Quality Technology and Quantitative Management 9(1):35–44CrossRefGoogle Scholar
  97. Xie M (1991) Software Reliability Modeling. World Scientific Press, LondonCrossRefMATHGoogle Scholar
  98. Xie M, Ho S (1999) Analysis of repairable system failure data using time series models. Journal of Quality in Maintenance Engineering 5(1):50–61CrossRefGoogle Scholar
  99. Xie M, Hong G, Wohlin C (1997) A study of the exponential smoothing technique in software reliability growth prediction. Quality and Reliability Engineering International 13(6):347–353CrossRefGoogle Scholar
  100. Yamada S (2014) Software Reliability Modeling Fundamentals and Applications. Springer, ISBN: 978-4-431-54564-4Google Scholar
  101. Yang Y (2003) Can the Strengths of AIC and BIC Be Shared? Biometrika 92(4):937–950CrossRefMATHGoogle Scholar
  102. Yang B, Li X, Xie M, Tan F (2010) A generic data-driven software reliability model with model mining technique. Reliability Engineering and System Safety 95(6):671–678CrossRefGoogle Scholar
  103. Zaidi S, Danial S, Usmani B (2008) Modeling inter-failure time series using neural networks. IEEE International Multitopic Conference, pp 409–411Google Scholar
  104. Zou H, Yang Y (2004) Combining time series models for forecasting. Int J Forecast 20(1):69–84Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.CorpLab, Information Systems Technology and Design PillarSingapore University of Technology and DesignSingaporeSingapore
  2. 2.Department of Industrial and Systems EngineeringThe State University of New JerseyRutgersUSA

Personalised recommendations