An oversampling approach for mining program specifications

  • Deng Chen
  • Yan-duo Zhang
  • Wei Wei
  • Rong-cun Wang
  • Xiao-lin Li
  • Wei Liu
  • Shi-xun Wang
  • Rui Zhu


Automatic protocol mining is a promising approach for inferring accurate and complete API protocols. However, just as with any data-mining technique, this approach requires sufficient training data (object usage scenarios). Existing approaches resolve the problem by analyzing more programs, which may cause significant runtime overhead. In this paper, we propose an inheritance-based oversampling approach for object usage scenarios (OUSs). Our technique is based on the inheritance relationship in object-oriented programs. Given an object-oriented program p, generally, the OUSs that can be collected from a run of p are not more than the objects used during the run. With our technique, a maximum of n times more OUSs can be achieved, where n is the average number of super-classes of all general OUSs. To investigate the effect of our technique, we implement it in our previous prototype tool, ISpecMiner, and use the tool to mine protocols from several real-world programs. Experimental results show that our technique can collect 1.95 times more OUSs than general approaches. Additionally, accurate and complete API protocols are more likely to be achieved. Furthermore, our technique can mine API protocols for classes never even used in programs, which are valuable for validating software architectures, program documentation, and understanding. Although our technique will introduce some runtime overhead, it is trivial and acceptable.

Key words

Object usage scenario API protocol mining Program temporal specification mining Oversampling 

CLC number



Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Alur R, Černý P, Madhusudan P, et al., 2005. Synthesis of interface specifications for Java classes. ACM SIGPLAN Not, 40(1):98–109. CrossRefzbMATHGoogle Scholar
  2. Ammons G, Bodík R, Larus JR, 2002. Mining specifications. ACM SIGPLAN Not, 37(1):4–16. CrossRefzbMATHGoogle Scholar
  3. Bruce KB, Wegner P, 1986. An algebraic model of sybtypes in object-oriented languages (draft). ACM SIGPLAN Not, 21(10):163–172. CrossRefGoogle Scholar
  4. Caserta P, Zendra O, 2014. JBInsTrace: a tracer of Java and JRE classes at basic-block granularity by dynamically instrumenting bytecode. Sci Comput Program, 79:116–125. CrossRefGoogle Scholar
  5. Chang RY, Podgurski A, Yang J, 2007. Finding what’s not there: a new approach to revealing neglected conditions in software. Proc Int Symp on Software Testing and Analysis, p.163–173. Google Scholar
  6. Chen D, Huang RB, Qu BB, et al., 2014. Improving static analysis performance using rule-filtering technique. Proc 26th Int Conf on Software Engineering and Knowledge Engineering, p.19–24.Google Scholar
  7. Chen D, Huang RB, Qu BB, et al., 2015a. Mining class temporal specification dynamically based on extended Markov model. Int J Softw Eng Knowl Eng, 25(3):573–604. CrossRefGoogle Scholar
  8. Chen D, Zhang YD, Wang RC, et al., 2015b. Extracting more object usage scenarios for API protocol mining. Proc 27th Int Conf on Software Engineering and Knowledge Engineering, p.607–612. Google Scholar
  9. Chen D, Zhang YD, Wei W, et al., 2017. Efficient vulnerability detection based on an optimized rule-checking static analysis technique. Front Inform Technol Electron Eng, 18(3):332–345. CrossRefGoogle Scholar
  10. Cook JE, Wolf AL, 1998. Discovering models of software processes from event-based data. ACM Trans Softw Eng Methodol, 7(3):215–249. CrossRefGoogle Scholar
  11. Dai ZY, Mao XG, Lei Y, et al., 2014. Compositional mining of multiple object API protocols through state abstraction. Sci World J, Article 171 647.Google Scholar
  12. Dallmeier V, Lindig C, Wasylkowski A, et al., 2006. Mining object behavior with ADABU. Proc Int Workshop on Dynamic Systems Analysis, p.17–24. Google Scholar
  13. Dallmeier V, Knopp N, Mallon C, et al., 2012. Automatically generating test cases for specification mining. IEEE Trans Softw Eng, 38(2):243–257. CrossRefGoogle Scholar
  14. Engler D, Chen DY, Hallem S, et al., 2001. Bugs as deviant behavior: a general approach to inferring errors in systems code. ACM SIGOPS Oper Syst Rev, 35(5):57–72. CrossRefGoogle Scholar
  15. Ernst MD, Perkins JH, Guo PJ, et al., 2007. The Daikon system for dynamic detection of likely invariants. Sci Comput Program, 69(1-3):35–45. MathSciNetCrossRefzbMATHGoogle Scholar
  16. Kernighan BW, Ritchie DM, 1988. The C Programming Language (2nd Ed.). Prentice Hall, Englewood Cliffs, NJ.Google Scholar
  17. Li ZM, Zhou YY, 2005. PR-miner: automatically extracting implicit programming rules and detecting violations in large software codes. ACM SIGSOFT Softw Eng Not, 30(5):306–315. CrossRefGoogle Scholar
  18. Liskov B, 1988. Keynote address—data abstraction and hierarchy. ACM SIGPLAN Not, 23(5):17–34. CrossRefGoogle Scholar
  19. Lorenzoli D, Mariani L, Pezzè M, 2008. Automatic generation of software behavioral models. Proc 30th Int Conf on Software Engineering, p.501–510. Google Scholar
  20. Pradel M, Gross TR, 2012. Leveraging test generation and specification mining for automated bug detection without false positives. Proc 34th Int Conf on Software Engineering, p.288–298. Google Scholar
  21. Pradel M, Jaspan C, Aldrich J, et al., 2012. Statically checking API protocol conformance with mined multi-object specifications. Proc 34th Int Conf on Software Engineering, p.925–935. Google Scholar
  22. Ramanathan MK, Grama A, Jagannathan S, 2007. Static specification inference using predicate mining. ACM SIGPLAN Not, 42(6):123–134. CrossRefGoogle Scholar
  23. Shoham S, Yahav E, Fink S, et al., 2007. Static specification mining using automata-based abstractions. Proc Int Symp on Software Testing and Analysis, p.174–184. Google Scholar
  24. Skala V, Petruska R, 2014. A new approach to hash function construction for textual data: a comparison. Proc 4th World Congress on Information and Communication Technologies, p.39–44. Google Scholar
  25. Tatsubori M, Sasaki T, Chiba S, et al., 2001. A bytecode translator for distributed execution of “legacy” Java software. Proc 15th European Conf on Object-Oriented Programming, p.236–255.Google Scholar
  26. Thummalapenta S, Xie T, 2011. Alattin: mining alternative patterns for defect detection. Autom Softw Eng, 18(3-4):293–323. CrossRefGoogle Scholar
  27. Wasylkowski A, 2007. Mining object usage models. Proc 29th Int Conf on Software Engineering, p.93–94. Google Scholar
  28. Wasylkowski A, Zeller A, Lindig C, 2007. Detecting object usage anomalies. Proc 6th Joint Meeting of the European Software Engineering Conf and the ACM SIGSOFT Symp on Foundations of Software Engineering, p.35–44. Google Scholar

Copyright information

© Zhejiang University and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Hubei Provincial Key Laboratory of Intelligent RobotWuhan Institute of TechnologyWuhanChina
  2. 2.School of Computer Science and TechnologyChina University of Mining and TechnologyXuzhouChina
  3. 3.School of Computer and Information EngineeringHenan Normal UniversityXinxiangChina

Personalised recommendations