Advertisement

Automatic Test Data Generation for a Given Set of Applications Using Recurrent Neural Networks

  • Ciprian PaduraruEmail author
  • Marius-Constantin MelemciucEmail author
  • Miruna Paduraru
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1077)

Abstract

To address the problem of automatic software testing against vulnerabilities, our work focuses on creating a tool capable in assisting users to generate automatic test sets for multiple programs under test at the same time. Starting with an initial set of inputs in a corpus folder, the tool works by clustering the inputs depending on their application target type, then produces a generative model for each of these clusters. The architecture of the models is falling in the recurrent neural network architecture class, and for training and inferencing the models we used the Tensorflow framework. Online-learning is supported by the tool, thus models can get better as long as new inputs for each application cluster are added to the corpus folder. Users can interact with the tool similar to the interface used in expert systems: customize various parameters exposed per cluster, or override various function hooks for learning and inferencing the model, with the purpose of getting finer control over the tool’s backend. As the evaluation section shows, the tool can be useful for creating important sets of new inputs, with good code coverage quality and less resources consumed.

Keywords

Fuzz testing Recurrent neural networks LSTM Tensorflow Pipeline Taint analysis 

Notes

Acknowledgments

This work was supported by a grant of Romanian Ministry of Research and Innovation CCCDI-UEFISCDI. project no. 17PCCDI/2018 We would like to thank our colleagues Teodor Stoenescu and Alexandra Sandulescu from Bitdefender, and to Alin Stefanescu from University of Bucharest for fruitful discussions and collaboration.

References

  1. 1.
  2. 2.
    Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016). http://arxiv.org/abs/1603.04467
  3. 3.
    Arzt, S., et al.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: PLDI (2014)Google Scholar
  4. 4.
    Avancini, A., Ceccato, M.: Towards security testing with taint analysis and genetic algorithms. In: Proceedings of the 2010 ICSE Workshop on Software Engineering for Secure Systems, SESS 2010, pp. 65–71. ACM, New York (2010).  https://doi.org/10.1145/1809100.1809110. http://doi.acm.org/10.1145/1809100.1809110
  5. 5.
    Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing program input grammars. SIGPLAN Not. 52(6), 95–110 (2017).  https://doi.org/10.1145/3140587.3062349. http://doi.acm.org/10.1145/3140587.3062349CrossRefGoogle Scholar
  6. 6.
    Bekrar, S., Groz, R., Mounier, L., Bekrar, C.: Finding software vulnerabilities by smart fuzzing. In: 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 427–430, March 2011.  https://doi.org/10.1109/ICST.2011.48. http://doi.ieeecomputersociety.org/10.1109/ICST.2011.48
  7. 7.
    Chen, P., Chen, H.: Angora: efficient fuzzing by principled search. CoRR abs/1803.01307 (2018)Google Scholar
  8. 8.
    Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014). http://arxiv.org/abs/1406.1078
  9. 9.
    Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. CoRR abs/1506.07503 (2015). http://arxiv.org/abs/1506.07503
  10. 10.
    Coppit, D., Lian, J.: Yagg: an easy-to-use generator for structured test inputs. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ASE 2005, pp. 356–359. ACM, New York (2005).  https://doi.org/10.1145/1101908.1101969. http://doi.acm.org/10.1145/1101908.1101969
  11. 11.
    Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: automatic reverse engineering of input formats. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS 2008, pp. 391–402. ACM, New York (2008).  https://doi.org/10.1145/1455770.1455820. http://doi.acm.org/10.1145/1455770.1455820
  12. 12.
    Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In: RT 2007 (2007)Google Scholar
  13. 13.
    Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2008, pp. 206–215. ACM, New York (2008).  https://doi.org/10.1145/1375581.1375607. http://doi.acm.org/10.1145/1375581.1375607
  14. 14.
    Godefroid, P., Levin, M.Y., Molnar, D.: Sage: whitebox fuzzing for security testing. Queue 10(1), 20:20–20:27 (2012).  https://doi.org/10.1145/2090147.2094081. http://doi.acm.org/10.1145/2090147.2094081CrossRefGoogle Scholar
  15. 15.
    Godefroid, P., Peleg, H., Singh, R.: Learn&fuzz: machine learning for input fuzzing. In: Rosu, G., Penta, M.D., Nguyen, T.N. (eds.) Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, 30 October–3 November 2017, pp. 50–59. IEEE Computer Society (2017).  https://doi.org/10.1109/ASE.2017.8115618
  16. 16.
    Hanford, K.V.: Automatic generation of test cases. IBM Syst. J. 9(4), 242–257 (1970).  https://doi.org/10.1147/sj.94.0242CrossRefGoogle Scholar
  17. 17.
    Höschele, M., Zeller, A.: Mining input grammars from dynamic taints. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pp. 720–725. ACM, New York (2016).  https://doi.org/10.1145/2970276.2970321. http://doi.acm.org/10.1145/2970276.2970321
  18. 18.
    Kaner, C., Bach, J., Pettichord, B.: Lessons Learned in Software Testing. Wiley, New York (2001)Google Scholar
  19. 19.
    Lämmel, R., Schulte, W.: Controllable combinatorial coverage in grammar-based testing. In: Uyar, M.Ü., Duale, A.Y., Fecko, M.A. (eds.) TestCom 2006. LNCS, vol. 3964, pp. 19–38. Springer, Heidelberg (2006).  https://doi.org/10.1007/11754008_2CrossRefzbMATHGoogle Scholar
  20. 20.
    Majumdar, R., Xu, R.G.: Directed test generation using symbolic grammars. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, ASE 2007, pp. 134–143. ACM, New York (2007).  https://doi.org/10.1145/1321631.1321653. http://doi.acm.org/10.1145/1321631.1321653
  21. 21.
    Mathis, B.: Dynamic tainting for automatic test case generation. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017, pp. 436–439. ACM, New York (2017).  https://doi.org/10.1145/3092703.3098233. http://doi.acm.org/10.1145/3092703.3098233
  22. 22.
    Newsome, J.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software (2005)Google Scholar
  23. 23.
    Paduraru, C., Melemciuc, M.: An automatic test data generation tool using machine learning. In: Maciaszek, L.A., van Sinderen, M. (eds.) Proceedings of the 13th International Conference on Software Technologies, ICSOFT 2018, Porto, Portugal, 26–28 July 2018, pp. 506–515. SciTePress (2018).  https://doi.org/10.5220/0006836605060515
  24. 24.
    Paduraru, C., Melemciuc, M., Stefanescu, A.: A distributed implementation using apache spark of a genetic algorithm applied to test data generation. In: Bosman, P.A.N. (ed.) Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017, Companion Material Proceedings, pp. 1857–1863. ACM (2017).  https://doi.org/10.1145/3067695.3084219. http://doi.acm.org/10.1145/3067695.3084219
  25. 25.
    Purdom, P.: A sentence generator for testing parsers. BIT Numer. Math. 12(3), 366–375 (1972).  https://doi.org/10.1007/BF01932308MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Rajpal, M., Blum, W., Singh, R.: Not all bytes are equal: Neural byte sieve for fuzzing. CoRR abs/1711.04596 (2017). http://arxiv.org/abs/1711.04596
  27. 27.
    Sirer, E.G., Bershad, B.N.: Using production grammars in software testing. SIGPLAN Not. 35(1), 1–13 (1999).  https://doi.org/10.1145/331963.331965. http://doi.acm.org/10.1145/331963.331965CrossRefGoogle Scholar
  28. 28.
    Stoenescu, T., Stefanescu, A., Predut, S., Ipate, F.: Binary analysis based on symbolic execution and reversible x86 instructions. Fundam. Inform. 153(1–2), 105–124 (2017).  https://doi.org/10.3233/FI-2017-1533MathSciNetCrossRefGoogle Scholar
  29. 29.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014). http://arxiv.org/abs/1409.3215
  30. 30.
    Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley Professional, Reading (2007)Google Scholar
  31. 31.
    Tripp, O., Pistoia, M., Fink, S.J., Sridharan, M., Weisman, O.: Taj: effective taint analysis of web applications. SIGPLAN Not. 44(6), 87–97 (2009).  https://doi.org/10.1145/1543135.1542486. http://doi.acm.org/10.1145/1543135.1542486CrossRefGoogle Scholar
  32. 32.
    Utting, M., Pretschner, A., Legeard, B.: A taxonomy of model-based testing approaches. Softw. Test. Verif. Reliab. 22(5), 297–312 (2012).  https://doi.org/10.1002/stvr.456CrossRefGoogle Scholar
  33. 33.
    Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014). http://arxiv.org/abs/1409.2329

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The Research Institute of the University of Bucharest (ICUB), University of BucharestBucharestRomania
  2. 2.Department of Computing ScienceUniversity of BucharestBucharestRomania

Personalised recommendations