Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases

Harman, Mark; Islam, Syed; Jia, Yue; Minku, Leandro L.; Sarro, Federica; Srivisut, Komsan

doi:10.1007/978-3-319-09940-8_19

Mark Harman¹⁷,
Syed Islam¹⁷,
Yue Jia¹⁷,
Leandro L. Minku¹⁸,
Federica Sarro¹⁷ &
…
Komsan Srivisut¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8636))

Included in the following conference series:

International Symposium on Search Based Software Engineering

1284 Accesses
16 Citations

Abstract

We investigate search based fault prediction over time based on 8 consecutive Hadoop versions, aiming to analyse the impact of chronology on fault prediction performance. Our results confound the assumption, implicit in previous work, that additional information from historical versions improves prediction; though G-mean tends to improve, Recall can be reduced.

Author order is alphabetical.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afzal, W., Torkar, R.: On the application of genetic programming for software engineering predictive modeling: A systematic review. Expert Systems Applications 38(9), 11984–11997 (2011)
Article Google Scholar
Arcuri, A., Briand, L.: A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: ICSE, pp. 1–10 (2011)
Google Scholar
Bouktif, S., Sahraoui, H., Antoniol, G.: Simulated annealing for improving software quality prediction. In: GECCO, vol. 2, pp. 1893–1900 (2006)
Google Scholar
Chidamber, S.R., Kemerer, C.F.: A metrics suite for object oriented design. IEEE TSE 20(6), 476–493 (1994)
Google Scholar
Di Martino, S., Ferrucci, F., Gravino, C., Sarro, F.: A genetic algorithm to configure support vector machines for predicting fault-prone components. In: Caivano, D., Oivo, M., Baldassarre, M.T., Visaggio, G. (eds.) PROFES 2011. LNCS, vol. 6759, pp. 247–261. Springer, Heidelberg (2011)
Chapter Google Scholar
Elish, K.O., Elish, M.O.: Predicting defect-prone software modules using support vector machines. JSS 81(5), 649–660 (2008)
Google Scholar
Ferrucci, F., Harman, M., Sarro, F.: Search based software project management. In: Ruhe, G., Wohlin, C. (eds.) Software Project Management in a Changing World, Springer (to appear, 2014)
Google Scholar
Gondra, I.: Applying machine learning to software fault-proneness prediction. JSS 81(2), 186–195 (2008)
Google Scholar
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic literature review on fault prediction performance in software engineering. IEEE TSE 38(6), 1276–1304 (2012)
Google Scholar
Harman, M.: How SBSE can support construction and analysis of predictive models (keynote). In: PROMISE (2010)
Google Scholar
Harman, M., Burke, E., Clark, J.A., Yao, X.: Dynamic adaptive search based software engineering. In: ESEM, pp. 1–8 (2012)
Google Scholar
Harman, M., McMinn, P., de Souza, J.T., Yoo, S.: Search based software engineering: Techniques, taxonomy, tutorial. In: Meyer, B., Nordio, M. (eds.) LASER Summer School 2008-2010. LNCS, vol. 7007, pp. 1–59. Springer, Heidelberg (2012)
Google Scholar
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE TKDE 21(9), 1263–1284 (2009)
Google Scholar
Krogmann, K., Kuperberg, M., Reussner, R.: Using genetic search for reverse engineering of parametric behaviour models for performance prediction. IEEE TSE 36(6), 865–877 (2010)
Google Scholar
Minku, L., Yao, X.: Can cross-company data improve performance in software effort estimation? In: PROMISE, pp. 69–78 (2012)
Google Scholar
Minku, L., Yao, X.: How to make best use of cross-company data in software effort estimation? In: ICSE, pp. 446–456 (2014)
Google Scholar
Ostrand, T.J., Weyuker, E.J.: How to measure success of fault prediction models. In: SOQUA 2007, pp. 25–30. ACM (2007)
Google Scholar
Rodríguez, D., Ruiz, R., Riquelme, J.C., Harrison, R.: Subgroup discovery for defect prediction. In: Cohen, M.B., Ó Cinnéide, M. (eds.) SSBSE 2011. LNCS, vol. 6956, pp. 269–270. Springer, Heidelberg (2011)
Chapter Google Scholar
Sarro, F., Di Martino, S., Ferrucci, F., Gravino, C.: A further analysis on the use of genetic algorithm to configure support vector machines for inter-release fault prediction. In: ACM-SAC, pp. 1215–1220 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

CREST, University College London, UK
Mark Harman, Syed Islam, Yue Jia & Federica Sarro
CERCIA, University of Birmingham, UK
Leandro L. Minku
Department of Computer Science, University of York, UK
Komsan Srivisut

Authors

Mark Harman
View author publications
You can also search for this author in PubMed Google Scholar
Syed Islam
View author publications
You can also search for this author in PubMed Google Scholar
Yue Jia
View author publications
You can also search for this author in PubMed Google Scholar
Leandro L. Minku
View author publications
You can also search for this author in PubMed Google Scholar
Federica Sarro
View author publications
You can also search for this author in PubMed Google Scholar
Komsan Srivisut
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science, Institute for Software Research, Carnegie Mellon University, 5000 Forbes Avenue, 15213, Pittsburgh, PA, USA
Claire Le Goues
Department of Computer Scioence, University College London, Gower Street, WC1E 6BT, London, UK
Shin Yoo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Harman, M., Islam, S., Jia, Y., Minku, L.L., Sarro, F., Srivisut, K. (2014). Less is More: Temporal Fault Predictive Performance over Multiple Hadoop Releases. In: Le Goues, C., Yoo, S. (eds) Search-Based Software Engineering. SSBSE 2014. Lecture Notes in Computer Science, vol 8636. Springer, Cham. https://doi.org/10.1007/978-3-319-09940-8_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-09940-8_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09939-2
Online ISBN: 978-3-319-09940-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics