Skip to main content

FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

  • 1688 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12651)

Abstract

Form validators based on regular expressions are often used on digital forms to prevent users from inserting data in the wrong format. However, writing these validators can pose a challenge to some users.

We present Forest, a regular expression synthesizer for digital form validations. Forest produces a regular expression that matches the desired pattern for the input values and a set of conditions over capturing groups that ensure the validity of integer values in the input. Our synthesis procedure is based on enumerative search and uses a Satisfiability Modulo Theories (SMT) solver to explore and prune the search space. We propose a novel representation for regular expressions synthesis, multi-tree, which induces patterns in the examples and uses them to split the problem through a divide-and-conquer approach. We also present a new SMT encoding to synthesize capture conditions for a given regular expression. To increase confidence in the synthesized regular expression, we implement user interaction based on distinguishing inputs.

We evaluated Forest on real-world form-validation instances using regular expressions. Experimental results show that Forest successfully returns the desired regular expression in 70% of the instances and outperforms Regel, a state-of-the-art regular expression synthesizer.

This work was supported by NSF award CCF-1762363 and through FCT under project UIDB/50021/2020, and project ANI 045917 funded by FEDER and FCT.

References

  1. Chen, Q., Wang, X., Ye, X., Durrett, G., Dillig, I.: Multi-modal synthesis of regular expressions. In: PLDI. ACM (2020)

    Google Scholar 

  2. Chen, Y., Martins, R., Feng, Y.: Maximal multi-layer specification synthesis. In: ESEC/SIGSOFT FSE. pp. 602–612. ACM (2019)

    Google Scholar 

  3. Fedyukovich, G., Gupta, A.: Functional synthesis with examples. In: CP. Lecture Notes in Computer Science, vol. 11802, pp. 547–564. Springer (2019)

    Google Scholar 

  4. Feng, Y., Martins, R., Bastani, O., Dillig, I.: Program synthesis using conflict-driven learning. In: PLDI. pp. 420–435. ACM (2018)

    Google Scholar 

  5. Feng, Y., Martins, R., Geffen, J.V., Dillig, I., Chaudhuri, S.: Component-based synthesis of table consolidation and transformation tasks from examples. In: PLDI. pp. 422–436. ACM (2017)

    Google Scholar 

  6. Golia, P., Roy, S., Meel, K.S.: Manthan: A data driven approach for boolean function synthesis. In: CAV. Springer (2020)

    Google Scholar 

  7. Gulwani, S.: Automating string processing in spreadsheets using input-output examples. In: POPL. pp. 317–330. ACM (2011)

    Google Scholar 

  8. Kini, D., Gulwani, S.: Flashnormalize: Programming by examples for text normalization. In: IJCAI. pp. 776–783. AAAI Press (2015)

    Google Scholar 

  9. Kushman, N., Barzilay, R.: Using semantic unification to generate regular expressions from natural language. In: HLT-NAACL. pp. 826–836. The Association for Computational Linguistics (2013)

    Google Scholar 

  10. Lee, M., So, S., Oh, H.: Synthesizing regular expressions from examples for introductory automata assignments. In: GPCE. pp. 70–80. ACM (2016)

    Google Scholar 

  11. Li, H., Chan, C., Maier, D.: Query from examples: An iterative, data-driven approach to query construction. Proc. VLDB Endow. 8(13), 2158–2169 (2015)

    Google Scholar 

  12. Locascio, N., Narasimhan, K., DeLeon, E., Kushman, N., Barzilay, R.: Neural generation of regular expressions from natural language with minimal domain knowledge. In: EMNLP. pp. 1918–1923. The Association for Computational Linguistics (2016)

    Google Scholar 

  13. Martins, R., Chen, J., Chen, Y., Feng, Y., Dillig, I.: Trinity: An Extensible Synthesis Framework for Data Science. PVLDB 12(12), 1914–1917 (2019)

    Google Scholar 

  14. Mayer, M., Soares, G., Grechkin, M., Le, V., Marron, M., Polozov, O., Singh,R., Zorn, B.G., Gulwani, S.: User interaction models for disambiguation in programming by example. In: UIST. pp. 291–301. ACM (2015)

    Google Scholar 

  15. de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: TACAS. Lecture Notes in Computer Science, vol. 4963, pp. 337–340. Springer (2008)

    Google Scholar 

  16. Orvalho, P., Terra-Neves, M., Ventura, M., Martins, R., Manquinho, V.M.:Squares. https://squares-sql.github.io, accessed on May 27, 2020

  17. Orvalho, P., Terra-Neves, M., Ventura, M., Martins, R., Manquinho, V.M.: Encodings for enumeration-based program synthesis. In: CP. Lecture Notes in Computer Science, vol. 11802, pp. 583–599. Springer (2019)

    Google Scholar 

  18. Python Software Foundation: Python3’s regular expression module re. https://docs.python.org/3/library/re.html, accessed on October 11, 2020

  19. Raza, M., Gulwani, S.: Automated data extraction using predictive program synthesis. In: AAAI. pp. 882–890. AAAI Press (2017)

    Google Scholar 

  20. Regular Expression Library: www.regexlib.com, accessed on May 27, 2020

  21. Reynolds, A., Barbosa, H., Nötzli, A., Barrett, C.W., Tinelli, C.: cvc4sy: Smart and fast term enumeration for syntax-guided synthesis. In: CAV. Lecture Notes in Computer Science, vol. 11562, pp. 74–83. Springer(2019)

    Google Scholar 

  22. Solar-Lezama, A.: Program sketching. Int. J. Softw. Tools Technol. Transf. 15(5-6), 475–495 (2013)

    Google Scholar 

  23. Stanford, C., Veanes, M., Bjørner, N.: Symbolic boolean derivatives for efficiently solving extended regular expression constraints. Tech. Rep. MSR-TR-2020-25, Microsoft (August 2020), updated November 2020.

    Google Scholar 

  24. Wang, C., Cheung, A., Bodík, R.: Interactive query synthesis from input-output examples. In: SIGMOD Conference. pp. 1631–1634. ACM (2017)

    Google Scholar 

  25. Wang, C., Cheung, A., Bodík, R.: Synthesizing highly expressive SQL queries from input-output examples. In: PLDI. pp. 452–466. ACM (2017)

    Google Scholar 

  26. Wang, X., Gulwani, S., Singh, R.: FIDEX: filtering spreadsheet data using examples. In: OOPSLA. pp. 195–213. ACM (2016)

    Google Scholar 

  27. Zhong, Z., Guo, J., Yang, W., Peng, J., Xie, T., Lou, J., Liu, T., Zhang, D.: Semregex: A semantics-based approach for generating regular expressions from natural language specifications. In: EMNLP. pp. 1608–1618. Association for Computational Linguistics (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Margarida Ferreira .

Editor information

Editors and Affiliations

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2021 The Author(s)

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ferreira, M., Terra-Neves, M., Ventura, M., Lynce, I., Martins, R. (2021). FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions. In: Groote, J.F., Larsen, K.G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2021. Lecture Notes in Computer Science(), vol 12651. Springer, Cham. https://doi.org/10.1007/978-3-030-72016-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-72016-2_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72015-5

  • Online ISBN: 978-3-030-72016-2

  • eBook Packages: Computer ScienceComputer Science (R0)