Graphs Resemblance based Software Birthmarks through Data Mining for Piracy Control


The emergence of software artifacts greatly emphasizes the need for protecting intellectual property rights (IPR) hampered by software piracy requiring effective measures for software piracy control. Software birthmarking targets to counter ownership theft of software by identifying similarity of their origins. A novice birthmarking approach has been proposed in this paper that is based on hybrid of text-mining and graph-mining techniques. The code elements of a program and their relations with other elements have been identified through their properties (i.e., code constructs) and transformed into Graph Manipulation Language (GML). The software birthmarks generated by exploiting the graph theoretic properties (through clustering coefficient) are used for the classifications of similarity or dissimilarity of two programs. The proposed technique has been evaluated over metrics of credibility, resilience, method theft, modified code detection and self-copy detection for programs asserting the effectiveness of proposed approach against software ownership theft. The comparative analysis of proposed approach with contemporary ones shows better results for having properties and relations of program nodes and for employing dynamic techniques of graph mining without adding any overhead (such as increased program size and processing cost).

This is a preview of subscription content, log in to check access.

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.


  1. 1

    Proc. 8th Annu. BSA Global Software 2013 Piracy Study, Washington, DC: Bus. Software Alliance, 2013.

  2. 2

    Anckaert, B., De Sutter, B., Chanet, D., and Bos-schere, K., Steganography for executables and code transformation signatures, Proc. 7th Int. Conf. on Information Security and Cryptology, Seoul, 2005, pp. 425–439.

  3. 3

    Fu, B., Richard G., and Chen, Y., Some new approaches for preventing software tampering, Proc. 44th Annu. Southeast Regional Conf., Melbourne, FL, 2006, pp. 655–660.

  4. 4

    Collberg, S. and Thomborson, C., “Watermarking, tamper-proofing, and obfuscation-tools for software protection, IEEE Trans. Software Eng., 2002, vol. 28, pp. 735–746.

    Article  Google Scholar 

  5. 5

    Udupa, S.K., Debray, K., and Madou, M., Deobfuscation: reverse engineering obfuscated code, Proc. 16th Conf. on Reverse Engineering, Lille, 2009, pp. 10–19.

  6. 6

    Palsberg, J., Krishnaswamy, S., and Kwon, M., Experience with software watermarking, Proc. 19th Computer Security Applications Conf., Las Vegas, 2003, pp. 308–316.

  7. 7

    Bai, Y., Sun, X., Sun, G., Deng, X., and Zhou, X., “Dynamic k-gram based software birthmark, Proc. 19th Australian Conf. on Software Engineering, Perth, 2008, pp. 644–649.

  8. 8

    Mahmood, Y., Pervez, Z., Sarwar, S., and Ahmed, H.F., Similarity level method based static software birthmarks, Proc. Int. Symp. on High Capacity Optical Networks and Enabling Technologies, Penang, 2008, pp. 205–210.

  9. 9

    Schuler, D., Dallmeier, V., and Lindig, C., A dynamic birthmark for Java, Proc. 22nd IEEE/ACM Int. Conf. on Automated Software Engineering, Atlanta, 2007, pp. 274–283.

  10. 10

    Nazir, S., Shahzad, S., Khan, S.A., Alias, N.B., and Anwar, S., A novel rules based approach for estimating software birthmark, Sci. World J., 2015, vol. 2015, art. ID 579390.

    Article  Google Scholar 

  11. 11

    Jorge, E.N., Pirmez, L., Costa, O., Boccardo, R., and Bento, M., Tiny watermark: a code obfuscation-based software watermarking framework for wireless sensor networks, Proc. Int. Conf. on Wireless Networks, ICWN’14, Las Vegas, 2014.

  12. 12

    Nayakoji, N. and Sonavane, S., JavaScript theft detection using birthmark and subgraph isomorphism, J. Eng. Comput. Appl. Sci., 2014, vol. 3, pp. 1–5.

    Google Scholar 

  13. 13

    Che, S. and Wang, Y., A software watermarking based on PE file with tamper-proof function, TELKOMNIKA Indones. J. Electron. Eng., 2014, vol. 12, pp. 1012–1021.

    Article  Google Scholar 

  14. 14

    Zhu, F., Concepts and Techniques in Software Watermarking and Obfuscation, Auckland: Research Space, 2007.

    Google Scholar 

  15. 15

    Myles, G. and Collberg, C., Detecting software theft via whole program path birthmarks, in Information Security, New York: Springer-Verlag, 2004, pp. 404–415.

    Google Scholar 

  16. 16

    Myles, G. and Collberg, C., K-gram based software birthmarks, Proc. 2005 ACM Symp. on Applied Computing, Santa Fe, 2005, pp. 314–318.

  17. 17

    Tian, Z., Liu, T., and Zheng, Q., A new thread-aware birthmark for plagiarism detection of multithreaded programs, Proc. 38th Int. Conf. on Software Engineering Companion, Austin, 2016, pp. 734–736.

  18. 18

    Tian, Z., Liu, T., and Zheng, Q., Exploiting thread-related system calls for plagiarism detection of multithreaded programs, J. Syst. Software, 2016, vol. 119, 136–148.

    Article  Google Scholar 

  19. 19

    Bhattacharya, S., Survey on digital watermarking—a digital forensics & security application, Int. J. Piracy Control, 2014, vol. 4.

    Google Scholar 

  20. 20

    Khan, A., Siddiqa, A., and Munib, S., A recent survey of reversible watermarking techniques, Inf. Sci., 2014, vol. 279, pp. 251–272.

    Article  Google Scholar 

  21. 21

    Zhou, W., Zhang, X., and Jiang, X., AppInk: watermarking android apps for repackaging deterrence, Proc. 8th ACM SIGSAC Symp. on Information, Computer and Communications Security, Hangzhou, 2013, pp. 1–12.

  22. 22

    Ren, C., Chen, K., and Liu, P., Droidmarking: resilient software watermarking for impeding android application repackaging, Proc. 29th ACM/IEEE Int. Conf. on Automated Software Engineering, Vasteras, 2014, pp. 635–646.

  23. 23

    Sun, G., Fan, X., Fu, S., Song, X., and Luo, H., Software watermarking in the cloud: analysis and rigorous theoretic treatment, J. Software Eng., 2015, vol. 9, pp. 410–418.

    Article  Google Scholar 

  24. 24

    Rubini, P. and Leela, S., A survey on plagiarism detection in text mining, Int. J. Res. Comput. Appl. Rob., 2013, vol. 1, p. 117.

    Google Scholar 

  25. 25

    Oberreuter, G. and VelaSquez, J.D., Text mining applied to plagiarism detection: the use of words for detecting deviations in the writing style, Exp. Syst. Appl., 2013, vol. 40, pp. 3756–3763.

    Article  Google Scholar 

  26. 26

    Rana, H. and Stamp, M., Hunting for pirated software using metamorphic analysis, Inf. Secur. J. Global Perspect., 2014, vol. 23, pp. 68–85.

    Article  Google Scholar 

  27. 27

    Costa, M. and Gong, Z., Web structure mining: an introduction, Proc. IEEE Int. Conf. on Software Security Information Acquisition, Piscataway, NJ: Inst. Electr. Electron. Eng., 2005, p. 6.

  28. 28

    Vemparala, S., Di Troia, F., Corrado, V., Austin, H., and Stamo, M., Malware detection using dynamic birthmarks, Proc. ACM Int. Workshop on Security and Privacy Analytics, New Orleans, 2016, pp. 41–46.

  29. 29

    Zeng, K. and Athanas, P., A q-gram birthmarking approach to predicting reusable hardware, Proc. Design, Automation & Test in Europe Conf. & Exhibition (DATE), Dresden, 2016, pp. 1112–1115.

    Google Scholar 

  30. 30

    Bogdanov, P., Baumer, B., and Basu, P., As strong as the weakest link: mining diverse cliques in weighted graphs, in Machine Learning and Knowledge Discovery in Databases, New York: Springer-Verlag, 2013, pp. 525–540.

    Google Scholar 

  31. 31

    Getoor, L. and Diehl, P., Link mining: a survey, ACM SIGKDD Explor. Newslett., 2005, vol. 7, pp. 3–12.

    Article  Google Scholar 

  32. 32

    Kavitha, D., Rao, M., and Babu, K., A survey on assorted approaches to graph data mining, Int. J. Comput. Appl., 2011, vol. 14, pp. 43–46.

    Google Scholar 

  33. 33

    Tamada, H., Nakamura, M., and Monden, A., Java birthmarks—detecting the software theft, IEICE Trans. Inf. Syst., 2005, vol. 88, pp. 2148–2158.

    Article  Google Scholar 

  34. 34 Accessed November 2, 2017.

  35. 35

    Fan, M., Liu, J., Luo, X., et al., Android malware familial classification and representative sample selection via frequent sub-graph analysis, IEEE Trans. Inf. Forensics Secur., 2018, vol. 13, pp. 1890–1905.

    Article  Google Scholar 

  36. 36

    Tian, Z., Liu, T., Zheng, Q., et al., Reviving sequential program nirthmarking for multithreaded software plagiarism detection, IEEE Trans. Software Eng., 2018, vol. 44, pp. 491–511.

    Article  Google Scholar 

Download references

Author information



Corresponding authors

Correspondence to S. Sarwar or Z. Ul. Qayyum or M. Safyan or M. Iqbal or Y. Mahmood.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sarwar, S., Qayyum, Z.U., Safyan, M. et al. Graphs Resemblance based Software Birthmarks through Data Mining for Piracy Control. Program Comput Soft 45, 581–589 (2019).

Download citation