Skip to main content

Advertisement

SpringerLink
Book cover

International Conference on Tools and Algorithms for the Construction and Analysis of Systems

TACAS 2021: Tools and Algorithms for the Construction and Analysis of Systems pp 152–169Cite as

  1. Home
  2. Tools and Algorithms for the Construction and Analysis of Systems
  3. Conference paper
FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions

  • Margarida Ferreira10,11,
  • Miguel Terra-Neves11,
  • Miguel Ventura11,
  • Inês Lynce10 &
  • …
  • Ruben Martins12 
  • Conference paper
  • Open Access
  • First Online: 20 March 2021
  • 2029 Accesses

  • 1 Citations

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 12651)

Abstract

Form validators based on regular expressions are often used on digital forms to prevent users from inserting data in the wrong format. However, writing these validators can pose a challenge to some users.

We present Forest, a regular expression synthesizer for digital form validations. Forest produces a regular expression that matches the desired pattern for the input values and a set of conditions over capturing groups that ensure the validity of integer values in the input. Our synthesis procedure is based on enumerative search and uses a Satisfiability Modulo Theories (SMT) solver to explore and prune the search space. We propose a novel representation for regular expressions synthesis, multi-tree, which induces patterns in the examples and uses them to split the problem through a divide-and-conquer approach. We also present a new SMT encoding to synthesize capture conditions for a given regular expression. To increase confidence in the synthesized regular expression, we implement user interaction based on distinguishing inputs.

We evaluated Forest on real-world form-validation instances using regular expressions. Experimental results show that Forest successfully returns the desired regular expression in 70% of the instances and outperforms Regel, a state-of-the-art regular expression synthesizer.

This work was supported by NSF award CCF-1762363 and through FCT under project UIDB/50021/2020, and project ANI 045917 funded by FEDER and FCT.

Download conference paper PDF

References

  1. Chen, Q., Wang, X., Ye, X., Durrett, G., Dillig, I.: Multi-modal synthesis of regular expressions. In: PLDI. ACM (2020)

    Google Scholar 

  2. Chen, Y., Martins, R., Feng, Y.: Maximal multi-layer specification synthesis. In: ESEC/SIGSOFT FSE. pp. 602–612. ACM (2019)

    Google Scholar 

  3. Fedyukovich, G., Gupta, A.: Functional synthesis with examples. In: CP. Lecture Notes in Computer Science, vol. 11802, pp. 547–564. Springer (2019)

    Google Scholar 

  4. Feng, Y., Martins, R., Bastani, O., Dillig, I.: Program synthesis using conflict-driven learning. In: PLDI. pp. 420–435. ACM (2018)

    Google Scholar 

  5. Feng, Y., Martins, R., Geffen, J.V., Dillig, I., Chaudhuri, S.: Component-based synthesis of table consolidation and transformation tasks from examples. In: PLDI. pp. 422–436. ACM (2017)

    Google Scholar 

  6. Golia, P., Roy, S., Meel, K.S.: Manthan: A data driven approach for boolean function synthesis. In: CAV. Springer (2020)

    Google Scholar 

  7. Gulwani, S.: Automating string processing in spreadsheets using input-output examples. In: POPL. pp. 317–330. ACM (2011)

    Google Scholar 

  8. Kini, D., Gulwani, S.: Flashnormalize: Programming by examples for text normalization. In: IJCAI. pp. 776–783. AAAI Press (2015)

    Google Scholar 

  9. Kushman, N., Barzilay, R.: Using semantic unification to generate regular expressions from natural language. In: HLT-NAACL. pp. 826–836. The Association for Computational Linguistics (2013)

    Google Scholar 

  10. Lee, M., So, S., Oh, H.: Synthesizing regular expressions from examples for introductory automata assignments. In: GPCE. pp. 70–80. ACM (2016)

    Google Scholar 

  11. Li, H., Chan, C., Maier, D.: Query from examples: An iterative, data-driven approach to query construction. Proc. VLDB Endow. 8(13), 2158–2169 (2015)

    Google Scholar 

  12. Locascio, N., Narasimhan, K., DeLeon, E., Kushman, N., Barzilay, R.: Neural generation of regular expressions from natural language with minimal domain knowledge. In: EMNLP. pp. 1918–1923. The Association for Computational Linguistics (2016)

    Google Scholar 

  13. Martins, R., Chen, J., Chen, Y., Feng, Y., Dillig, I.: Trinity: An Extensible Synthesis Framework for Data Science. PVLDB 12(12), 1914–1917 (2019)

    Google Scholar 

  14. Mayer, M., Soares, G., Grechkin, M., Le, V., Marron, M., Polozov, O., Singh,R., Zorn, B.G., Gulwani, S.: User interaction models for disambiguation in programming by example. In: UIST. pp. 291–301. ACM (2015)

    Google Scholar 

  15. de Moura, L.M., Bjørner, N.: Z3: an efficient SMT solver. In: TACAS. Lecture Notes in Computer Science, vol. 4963, pp. 337–340. Springer (2008)

    Google Scholar 

  16. Orvalho, P., Terra-Neves, M., Ventura, M., Martins, R., Manquinho, V.M.:Squares. https://squares-sql.github.io, accessed on May 27, 2020

  17. Orvalho, P., Terra-Neves, M., Ventura, M., Martins, R., Manquinho, V.M.: Encodings for enumeration-based program synthesis. In: CP. Lecture Notes in Computer Science, vol. 11802, pp. 583–599. Springer (2019)

    Google Scholar 

  18. Python Software Foundation: Python3’s regular expression module re. https://docs.python.org/3/library/re.html, accessed on October 11, 2020

  19. Raza, M., Gulwani, S.: Automated data extraction using predictive program synthesis. In: AAAI. pp. 882–890. AAAI Press (2017)

    Google Scholar 

  20. Regular Expression Library: www.regexlib.com, accessed on May 27, 2020

  21. Reynolds, A., Barbosa, H., Nötzli, A., Barrett, C.W., Tinelli, C.: cvc4sy: Smart and fast term enumeration for syntax-guided synthesis. In: CAV. Lecture Notes in Computer Science, vol. 11562, pp. 74–83. Springer(2019)

    Google Scholar 

  22. Solar-Lezama, A.: Program sketching. Int. J. Softw. Tools Technol. Transf. 15(5-6), 475–495 (2013)

    Google Scholar 

  23. Stanford, C., Veanes, M., Bjørner, N.: Symbolic boolean derivatives for efficiently solving extended regular expression constraints. Tech. Rep. MSR-TR-2020-25, Microsoft (August 2020), updated November 2020.

    Google Scholar 

  24. Wang, C., Cheung, A., Bodík, R.: Interactive query synthesis from input-output examples. In: SIGMOD Conference. pp. 1631–1634. ACM (2017)

    Google Scholar 

  25. Wang, C., Cheung, A., Bodík, R.: Synthesizing highly expressive SQL queries from input-output examples. In: PLDI. pp. 452–466. ACM (2017)

    Google Scholar 

  26. Wang, X., Gulwani, S., Singh, R.: FIDEX: filtering spreadsheet data using examples. In: OOPSLA. pp. 195–213. ACM (2016)

    Google Scholar 

  27. Zhong, Z., Guo, J., Yang, W., Peng, J., Xie, T., Lou, J., Liu, T., Zhang, D.: Semregex: A semantics-based approach for generating regular expressions from natural language specifications. In: EMNLP. pp. 1608–1618. Association for Computational Linguistics (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. INESC-ID,Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal

    Margarida Ferreira & Inês Lynce

  2. OutSystems, Linda-a-Velha, Portugal

    Margarida Ferreira, Miguel Terra-Neves & Miguel Ventura

  3. Carnegie Mellon University, Pittsburgh, USA

    Ruben Martins

Authors
  1. Margarida Ferreira
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Miguel Terra-Neves
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Miguel Ventura
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Inês Lynce
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Ruben Martins
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Margarida Ferreira .

Editor information

Editors and Affiliations

  1. Eindhoven University of Technology, Eindhoven, The Netherlands

    Prof. Jan Friso Groote

  2. Aalborg University, Aalborg East, Denmark

    Prof. Kim Guldstrand Larsen

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and Permissions

Copyright information

© 2021 The Author(s)

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Ferreira, M., Terra-Neves, M., Ventura, M., Lynce, I., Martins, R. (2021). FOREST: An Interactive Multi-tree Synthesizer for Regular Expressions. In: Groote, J.F., Larsen, K.G. (eds) Tools and Algorithms for the Construction and Analysis of Systems. TACAS 2021. Lecture Notes in Computer Science(), vol 12651. Springer, Cham. https://doi.org/10.1007/978-3-030-72016-2_9

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-030-72016-2_9

  • Published: 20 March 2021

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-72015-5

  • Online ISBN: 978-3-030-72016-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • The European Joint Conferences on Theory and Practice of Software.

    Published in cooperation with

    http://www.etaps.org/

Over 10 million scientific documents at your fingertips

Switch Edition
  • Academic Edition
  • Corporate Edition
  • Home
  • Impressum
  • Legal information
  • Privacy statement
  • California Privacy Statement
  • How we use cookies
  • Manage cookies/Do not sell my data
  • Accessibility
  • FAQ
  • Contact us
  • Affiliate program

Not affiliated

Springer Nature

© 2023 Springer Nature Switzerland AG. Part of Springer Nature.