Skip to main content

Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases

  • Conference paper
Search-Based Software Engineering (SSBSE 2014)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8636))

Included in the following conference series:

Abstract

We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced.

Author order is alphabetical.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 44.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 59.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afzal, W., Torkar, R.: On the application of genetic programming for software engineering predictive modeling: A systematic review. Expert Systems Applications 38(9), 11984–11997 (2011)

    Article  Google Scholar 

  2. Arcuri, A., Briand, L.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE, pp. 1–10 (2011)

    Google Scholar 

  3. Bouktif, S., Sahraoui, H., Antoniol, G.: Simulated annealing for improving software quality prediction. In: GECCO, vol. 2, pp. 1893–1900 (2006)

    Google Scholar 

  4. Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE TSE 20(6), 476–493 (1994)

    Google Scholar 

  5. Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. In: Caivano, D., Oivo, M., Baldassarre, M.T., Visaggio, G. (eds.) PROFES 2011. LNCS, vol. 6759, pp. 247–261. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. JSS 81(5), 649–660 (2008)

    Google Scholar 

  7. Ferrucci, F., Harman, M., Sarro, F.: Search based software project management. In: Ruhe, G., Wohlin, C. (eds.) Software Project Management in a Changing World, Springer (to appear, 2014)

    Google Scholar 

  8. Gondra, I.: Applying machine learning to software fault-proneness prediction. JSS 81(2), 186–195 (2008)

    Google Scholar 

  9. Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE TSE 38(6), 1276–1304 (2012)

    Google Scholar 

  10. Harman, M.: How SBSE can support construction and analysis of predictive models (keynote). In: PROMISE (2010)

    Google Scholar 

  11. Harman, M., Burke, E., Clark, J.A., Yao, X.: Dynamic adaptive search based software engineering. In: ESEM, pp. 1–8 (2012)

    Google Scholar 

  12. Harman, M., McMinn, P., de Souza, J.T., Yoo, S.: Search based software engineering: Techniques, taxonomy, tutorial. In: Meyer, B., Nordio, M. (eds.) LASER Summer School 2008-2010. LNCS, vol. 7007, pp. 1–59. Springer, Heidelberg (2012)

    Google Scholar 

  13. He, H., Garcia, E.A.: Learning from imbalanced data. IEEE TKDE 21(9), 1263–1284 (2009)

    Google Scholar 

  14. Krogmann, K., Kuperberg, M., Reussner, R.: Using genetic search for reverse engineering of parametric behaviour models for performance prediction. IEEE TSE 36(6), 865–877 (2010)

    Google Scholar 

  15. Minku, L., Yao, X.: Can cross-company data improve performance in software effort estimation? In: PROMISE, pp. 69–78 (2012)

    Google Scholar 

  16. Minku, L., Yao, X.: How to make best use of cross-company data in software effort estimation? In: ICSE, pp. 446–456 (2014)

    Google Scholar 

  17. Ostrand, T.J., Weyuker, E.J.: How to measure success of fault prediction models. In: SOQUA 2007, pp. 25–30. ACM (2007)

    Google Scholar 

  18. Rodríguez, D., Ruiz, R., Riquelme, J.C., Harrison, R.: Subgroup discovery for defect prediction. In: Cohen, M.B., Ó Cinnéide, M. (eds.) SSBSE 2011. LNCS, vol. 6956, pp. 269–270. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  19. Sarro, F., Di Martino, S., Ferrucci, F., Gravino, C.: A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: ACM-SAC, pp. 1215–1220 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F., Srivisut, K. (2014). Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases. In: Le Goues, C., Yoo, S. (eds) Search-Based Software Engineering. SSBSE 2014. Lecture Notes in Computer Science, vol 8636. Springer, Cham. https://doi.org/10.1007/978-3-319-09940-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-09940-8_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-09939-2

  • Online ISBN: 978-3-319-09940-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics