Skip to main content
Log in

ProSy: API-Based Synthesis with Probabilistic Model

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Program synthesis is an exciting topic that desires to generate programs satisfying user intent automatically. But in most cases, only small programs for simple or domain-specific tasks can be synthesized. The major obstacle of synthesis lies in the huge search space. A common practice in addressing this problem is using a domain-specific language, while many approaches still wish to synthesize programs in general programming languages. With the rapid growth of reusable libraries, component-based synthesis provides a promising way, such as synthesizing Java programs which are only composed of APIs (application programming interfaces). However, the efficiency of searching for proper solutions for complex tasks is still a challenge. Given an unfamiliar programming task, programmers would search for API usage knowledge from various coding resources to reduce the search space. Considering this, we propose a novel approach named ProSy to synthesize API-based programs in Java. The key novelty is to retrieve related knowledge from Javadoc and Stack Overflow and then construct a probabilistic reachability graph. It assigns higher probabilities to APIs that are more likely to be used in implementing the given task. In the synthesis process, the program sketch with a higher probability will be considered first; thus, the number of explored reachable paths would be decreased. Some extension and optimization strategies are further studied in the paper. We implement our approach and conduct several experiments on it. We compare ProSy with SyPet and other state-of-the-art API-based synthesis approaches. The experimental results show that ProSy reduces the synthesis time of SyPet by up to 80%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gulwani S, Polozov O, Singh R. Program synthesis. Foundations and Trends in Programming Languages, 2017, 4(1/2): 1-119.

  2. Pnueli A, Rosner R. On the synthesis of a reactive module. In Proc. the 16th Annual ACM Symposium on Principles of Programming Languages, January 1989, pp.179-190.

  3. Gulwani S, Jain P. Programming by examples: PL meets ML. In Proc. the 15th Asian Symposium on Programming Languages and Systems, November 2017, pp.3-20.

  4. Alur R, Bodík R, Juniwal G, Martin M M K, Raghothaman M, Seshia S A, Singh R, Solar-Lezama A, Torlak E, Udupa A. Syntax-guided synthesis. In Proc. the 13th International Conference on Formal Methods in Computer-Aided Design, October 2013, pp.1-8.

  5. Gulwani S, Jha S, Tiwari A, Venkatesan R. Synthesis of loop-free programs. In Proc. the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2011, pp.62-73.

  6. Solar-Lezama A. Program sketching. Int. J. Softw. Tools Technol. Transf., 2013, 15(5/6): 475-495.

    Article  Google Scholar 

  7. Parisotto E, Mohamed A, Singh R, Li L, Zhou D, Kohli P. Neuro-symbolic program synthesis. In Proc. the 5th International Conference on Learning Representations, April 2017.

  8. Balog M, Gaunt A L, Brockschmidt M, Nowozin S, Tarlow D. DeepCoder: Learning to write programs. In Proc. the 5th International Conference on Learning Representations, April 2017.

  9. Perelman D, Gulwani S, Grossman D, Provost P. Test-driven synthesis. In Proc. the ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2014, pp.408-418.

  10. Feng Y, Martins R, Geffen J V, Dillig I, Chaudhuri S. Component-based synthesis of table consolidation and transformation tasks from examples. In Proc. the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2017, pp.422-436.

  11. Jha S, Gulwani S, Seshia S A, Tiwari A. Oracle-guided component-based program synthesis. In Proc. the 32nd ACM/IEEE International Conference on Software Engineering, May 2010, pp.215-224.

  12. Feng Y, Martins R, Wang Y, Dillig I, Reps T W. Component-based synthesis for complex APIs. In Proc. the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, January 2017, pp.599-612.

  13. Shi K, Steinhardt J, Liang P. FrAngel: Component-based synthesis with control structures. Proc. the ACM on Programming Languages, 2019, 3(POPL): Article No. 73.

  14. Mei H, Zhang L. Can big data bring a breakthrough for software automation? SCIENCE CHINA Information Sciences, 2018, 61(5): Article No. 056101.

  15. Petri C A, Reisig W. Petri net. Scholarpedia, 2008, 3(4): Article No. 6477.

  16. Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781, Sept. 2020.

  17. Pennington J, Socher R, Manning C D. Glove: Global vectors for word representation. In Proc. the 2014 Conference on Empirical Methods in Natural Language Processing, October 2014, pp.1532-1543.

  18. Eén N, Sörensson N. Translating pseudo-boolean constraints into SAT. J. Satisf. Boolean Model. Comput., 2006, 2(1/2/3/4): 1-26.

    MATH  Google Scholar 

  19. Darwin I F. Java Cookbook. O’Reilly, 2001.

  20. Murali V, Qi L, Chaudhuri S, Jermaine C. Neural sketch learning for conditional program generation. In Proc. the 6th International Conference on Learning Representations, April 2018.

  21. Gulwani S. Automating string processing in spreadsheets using input-output examples. In Proc. the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 2011, pp.317-330.

  22. Singh R, Gulwani S. Learning semantic string transformations from examples. Proc. the VLDB Endowment, 2012, 5(8): 740-751.

    Article  Google Scholar 

  23. Albarghouthi A, Gulwani S, Kincaid Z. Recursive program synthesis. In Proc. the 25th International Conference on Computer Aided Verification, July 2013, pp.934-950.

  24. Singh R, Gulwani S. Synthesizing number transformations from input-output examples. In Proc. the 24th International Conference on Computer Aided Verification, July 2012, pp.634-651.

  25. Gulwani S, Harris WR, Singh R. Spreadsheet data manipulation using examples. Communications of the ACM, 2012, 55(8): 97-105.

    Article  Google Scholar 

  26. Gulwani S, Korthikanti V A, Tiwari A. Synthesizing geometry constructions. In Proc. the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2011, pp.50-61.

  27. Feser J K, Chaudhuri S, Dillig I. Synthesizing data structure transformations from input-output examples. In Proc. the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2015, pp.229-239.

  28. Kalyan A, Mohta A, Polozov O, Batra D, Jain P, Gulwani S. Neural-guided deductive search for real-time program synthesis from examples. In Proc. the 6th International Conference on Learning Representations, April 2018.

  29. Ling W, Blunsom P, Grefenstette E, Hermann K M, Kociský T, Wang F, Senior A W. Latent predictor networks for code generation. In Proc. the 54th Annual Meeting of the Association for Computational Linguistics, August 2016.

  30. Yin P, Neubig G. A syntactic neural model for general purpose code generation. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.440-450.

  31. Rabinovich M, Stern M, Klein D. Abstract syntax networks for code generation and semantic parsing. In Proc. the 55th Annual Meeting of the Association for Computational Linguistics, July 2017, pp.1139-1149.

  32. Lee W, Heo K, Alur R, Naik M. Accelerating search-based program synthesis using learned probabilistic models. In Proc. the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2018, pp.436-449.

  33. Chen Y, Martins R, Feng Y. Maximal multi-layer specification synthesis. In Proc. the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, August 2019, pp.602-612.

  34. Linstead E, Bajracharya S K, Ngo T C, Rigor P, Lopes C V, Baldi P. Sourcerer: Mining and searching Internet-scale software repositories. Data Min. Knowl. Discov., 2009, 18(2): 300-336.

    Article  MathSciNet  Google Scholar 

  35. McMillan C, Grechanik M, Poshyvanyk D, Xie Q, Fu C. Portfolio: Finding relevant functions and their usage. In Proc. the 33rd International Conference on Software Engineering, May 2011, pp.111-120.

  36. Lu M, Sun X, Wang S, Lo D, Duan Y. Query expansion via WordNet for effective code search. In Proc. the 22nd IEEE International Conference on Software Analysis, March 2015, pp.545-549.

  37. Haiduc S, Bavota G, Marcus A, Oliveto R, Lucia A D, Menzies T. Automatic query reformulations for text retrieval in software engineering. In Proc. the 35th International Conference on Software Engineering, May 2013, pp.842-851.

  38. Lv F, Zhang H, Lou J, Wang S, Zhang D, Zhao J. CodeHow: Effective code search based on API understanding and extended Boolean model (E). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, November 2015, pp.260-270.

  39. Gu X, Zhang H, Kim S. Deep code search. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.933-944.

  40. Ke Y, Stolee K T, Goues C L, Brun Y. Repairing programs with semantic code search (T). In Proc. the 30th IEEE/ACM International Conference on Automated Software Engineering, November 2015, pp.295-306.

  41. Reiss S P. Semantics-based code search. In Proc. the 31st International Conference on Software Engineering, May 2009, pp.243-253.

  42. Stolee K T, Elbaum S G, Dobos D. Solving the search for source code. ACM Trans. Softw. Eng. Methodol., 2014, 23(3): Article No. 26.

  43. Sirres R, Bissyandé T F, Kim D, Lo D, Klein J, Kim K, Traon Y L. Augmenting and structuring user queries to support efficient free-form code search. In Proc. the 40th International Conference on Software Engineering, May 2018, pp.945-945.

  44. Nguyen A T, Hilton M, Codoban M, Nguyen H A, Mast L, Rademacher E, Nguyen T N, Dig D. API code recommendation using statistical learning from fine-grained changes. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.511-522.

  45. Gu X, Zhang H, Zhang D, Kim S. Deep API learning. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.631-642.

  46. Huang Q, Xia X, Xing Z, Lo D, Wang X. API method recommendation without worrying about the task-API knowledge gap. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, September 2018, pp.293-304.

  47. Liu X, Huang L, Ng V. Effective API recommendation without historical software repositories. In Proc. the 33rd ACM/IEEE International Conference on Automated Software Engineering, September 2018, pp.282-292.

  48. Li X, Jiang H, Kamei Y, Chen X. Bridging semantic gaps between natural languages and APIs with word embedding. arXiv:1810.09723, 2018. https://arxiv.org/abs/1810.09723, Sept. 2020.

  49. Mandelin D, Xu L, Bodík R, Kimelman D. Jungloid mining: Helping to navigate the API jungle. In Proc. the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 2005, pp.48-61.

  50. Chan W, Cheng H, Lo D. Searching connected API subgraph via text phrases. In Proc. the 20th ACM SIGSOFT Symposium on the Foundations of Software Engineering, November 2012, Article No. 10.

  51. van Nguyen T, Rigby P C, Nguyen A T, Karanfil M, Nguyen T N. T2API: Synthesizing API code usage templates from English texts with statistical translation. In Proc. the 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, November 2016, pp.1013-1017.

  52. Nguyen T T, Nguyen H A, Pham N H, Al-Kofahi J M, Nguyen T N. Graph-based mining of multiple object usage patterns. In Proc. the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on Foundations of Software Engineering, August 2009, pp.383-392.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei Dong.

Supplementary Information

ESM 1

(PDF 411 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, BB., Dong, W., Liu, JX. et al. ProSy: API-Based Synthesis with Probabilistic Model. J. Comput. Sci. Technol. 35, 1234–1257 (2020). https://doi.org/10.1007/s11390-020-0520-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-020-0520-4

Keywords

Navigation