Automatic Test Data Generation for a Given Set of Applications Using Recurrent Neural Networks

  • Ciprian PaduraruEmail author
  • Marius-Constantin MelemciucEmail author
  • Miruna Paduraru
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1077)


To address the problem of automatic software testing against vulnerabilities, our work focuses on creating a tool capable in assisting users to generate automatic test sets for multiple programs under test at the same time. Starting with an initial set of inputs in a corpus folder, the tool works by clustering the inputs depending on their application target type, then produces a generative model for each of these clusters. The architecture of the models is falling in the recurrent neural network architecture class, and for training and inferencing the models we used the Tensorflow framework. Online-learning is supported by the tool, thus models can get better as long as new inputs for each application cluster are added to the corpus folder. Users can interact with the tool similar to the interface used in expert systems: customize various parameters exposed per cluster, or override various function hooks for learning and inferencing the model, with the purpose of getting finer control over the tool’s backend. As the evaluation section shows, the tool can be useful for creating important sets of new inputs, with good code coverage quality and less resources consumed.


Fuzz testing Recurrent neural networks LSTM Tensorflow Pipeline Taint analysis 



This work was supported by a grant of Romanian Ministry of Research and Innovation CCCDI-UEFISCDI. project no. 17PCCDI/2018 We would like to thank our colleagues Teodor Stoenescu and Alexandra Sandulescu from Bitdefender, and to Alin Stefanescu from University of Bucharest for fruitful discussions and collaboration.


  1. 1.
  2. 2.
    Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. CoRR abs/1603.04467 (2016).
  3. 3.
    Arzt, S., et al.: Flowdroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps. In: PLDI (2014)Google Scholar
  4. 4.
    Avancini, A., Ceccato, M.: Towards security testing with taint analysis and genetic algorithms. In: Proceedings of the 2010 ICSE Workshop on Software Engineering for Secure Systems, SESS 2010, pp. 65–71. ACM, New York (2010).
  5. 5.
    Bastani, O., Sharma, R., Aiken, A., Liang, P.: Synthesizing program input grammars. SIGPLAN Not. 52(6), 95–110 (2017). Scholar
  6. 6.
    Bekrar, S., Groz, R., Mounier, L., Bekrar, C.: Finding software vulnerabilities by smart fuzzing. In: 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation (ICST), pp. 427–430, March 2011.
  7. 7.
    Chen, P., Chen, H.: Angora: efficient fuzzing by principled search. CoRR abs/1803.01307 (2018)Google Scholar
  8. 8.
    Cho, K., van Merrienboer, B., Gülçehre, Ç., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. CoRR abs/1406.1078 (2014).
  9. 9.
    Chorowski, J., Bahdanau, D., Serdyuk, D., Cho, K., Bengio, Y.: Attention-based models for speech recognition. CoRR abs/1506.07503 (2015).
  10. 10.
    Coppit, D., Lian, J.: Yagg: an easy-to-use generator for structured test inputs. In: Proceedings of the 20th IEEE/ACM International Conference on Automated Software Engineering, ASE 2005, pp. 356–359. ACM, New York (2005).
  11. 11.
    Cui, W., Peinado, M., Chen, K., Wang, H.J., Irun-Briz, L.: Tupni: automatic reverse engineering of input formats. In: Proceedings of the 15th ACM Conference on Computer and Communications Security, CCS 2008, pp. 391–402. ACM, New York (2008).
  12. 12.
    Godefroid, P.: Random testing for security: blackbox vs. whitebox fuzzing. In: RT 2007 (2007)Google Scholar
  13. 13.
    Godefroid, P., Kiezun, A., Levin, M.Y.: Grammar-based whitebox fuzzing. In: Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2008, pp. 206–215. ACM, New York (2008).
  14. 14.
    Godefroid, P., Levin, M.Y., Molnar, D.: Sage: whitebox fuzzing for security testing. Queue 10(1), 20:20–20:27 (2012). Scholar
  15. 15.
    Godefroid, P., Peleg, H., Singh, R.: Learn&fuzz: machine learning for input fuzzing. In: Rosu, G., Penta, M.D., Nguyen, T.N. (eds.) Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering, ASE 2017, Urbana, IL, USA, 30 October–3 November 2017, pp. 50–59. IEEE Computer Society (2017).
  16. 16.
    Hanford, K.V.: Automatic generation of test cases. IBM Syst. J. 9(4), 242–257 (1970). Scholar
  17. 17.
    Höschele, M., Zeller, A.: Mining input grammars from dynamic taints. In: Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering, ASE 2016, pp. 720–725. ACM, New York (2016).
  18. 18.
    Kaner, C., Bach, J., Pettichord, B.: Lessons Learned in Software Testing. Wiley, New York (2001)Google Scholar
  19. 19.
    Lämmel, R., Schulte, W.: Controllable combinatorial coverage in grammar-based testing. In: Uyar, M.Ü., Duale, A.Y., Fecko, M.A. (eds.) TestCom 2006. LNCS, vol. 3964, pp. 19–38. Springer, Heidelberg (2006). Scholar
  20. 20.
    Majumdar, R., Xu, R.G.: Directed test generation using symbolic grammars. In: Proceedings of the Twenty-Second IEEE/ACM International Conference on Automated Software Engineering, ASE 2007, pp. 134–143. ACM, New York (2007).
  21. 21.
    Mathis, B.: Dynamic tainting for automatic test case generation. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2017, pp. 436–439. ACM, New York (2017).
  22. 22.
    Newsome, J.: Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software (2005)Google Scholar
  23. 23.
    Paduraru, C., Melemciuc, M.: An automatic test data generation tool using machine learning. In: Maciaszek, L.A., van Sinderen, M. (eds.) Proceedings of the 13th International Conference on Software Technologies, ICSOFT 2018, Porto, Portugal, 26–28 July 2018, pp. 506–515. SciTePress (2018).
  24. 24.
    Paduraru, C., Melemciuc, M., Stefanescu, A.: A distributed implementation using apache spark of a genetic algorithm applied to test data generation. In: Bosman, P.A.N. (ed.) Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017, Companion Material Proceedings, pp. 1857–1863. ACM (2017).
  25. 25.
    Purdom, P.: A sentence generator for testing parsers. BIT Numer. Math. 12(3), 366–375 (1972). Scholar
  26. 26.
    Rajpal, M., Blum, W., Singh, R.: Not all bytes are equal: Neural byte sieve for fuzzing. CoRR abs/1711.04596 (2017).
  27. 27.
    Sirer, E.G., Bershad, B.N.: Using production grammars in software testing. SIGPLAN Not. 35(1), 1–13 (1999). Scholar
  28. 28.
    Stoenescu, T., Stefanescu, A., Predut, S., Ipate, F.: Binary analysis based on symbolic execution and reversible x86 instructions. Fundam. Inform. 153(1–2), 105–124 (2017). Scholar
  29. 29.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR abs/1409.3215 (2014).
  30. 30.
    Sutton, M., Greene, A., Amini, P.: Fuzzing: Brute Force Vulnerability Discovery. Addison-Wesley Professional, Reading (2007)Google Scholar
  31. 31.
    Tripp, O., Pistoia, M., Fink, S.J., Sridharan, M., Weisman, O.: Taj: effective taint analysis of web applications. SIGPLAN Not. 44(6), 87–97 (2009). Scholar
  32. 32.
    Utting, M., Pretschner, A., Legeard, B.: A taxonomy of model-based testing approaches. Softw. Test. Verif. Reliab. 22(5), 297–312 (2012). Scholar
  33. 33.
    Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. CoRR abs/1409.2329 (2014).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The Research Institute of the University of Bucharest (ICUB), University of BucharestBucharestRomania
  2. 2.Department of Computing ScienceUniversity of BucharestBucharestRomania

Personalised recommendations