Graphs Resemblance based Software Birthmarks through Data Mining for Piracy Control

  • 9 Accesses


The emergence of software artifacts greatly emphasizes the need for protecting intellectual property rights (IPR) hampered by software piracy requiring effective measures for software piracy control. Software birthmarking targets to counter ownership theft of software by identifying similarity of their origins. A novice birthmarking approach has been proposed in this paper that is based on hybrid of text-mining and graph-mining techniques. The code elements of a program and their relations with other elements have been identified through their properties (i.e., code constructs) and transformed into Graph Manipulation Language (GML). The software birthmarks generated by exploiting the graph theoretic properties (through clustering coefficient) are used for the classifications of similarity or dissimilarity of two programs. The proposed technique has been evaluated over metrics of credibility, resilience, method theft, modified code detection and self-copy detection for programs asserting the effectiveness of proposed approach against software ownership theft. The comparative analysis of proposed approach with contemporary ones shows better results for having properties and relations of program nodes and for employing dynamic techniques of graph mining without adding any overhead (such as increased program size and processing cost).

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Fig. 1.
Fig. 2.
Fig. 3.
Fig. 4.
Fig. 5.
Fig. 6.
Fig. 7.
Fig. 8.
Fig. 9.


  1. 1

    Proc. 8th Annu. BSA Global Software 2013 Piracy Study, Washington, DC: Bus. Software Alliance, 2013.

  2. 2

    Anckaert, B., De Sutter, B., Chanet, D., and Bos-schere, K., Steganography for executables and code transformation signatures, Proc. 7th Int. Conf. on Information Security and Cryptology, Seoul, 2005, pp. 425–439.

  3. 3

    Fu, B., Richard G., and Chen, Y., Some new approaches for preventing software tampering, Proc. 44th Annu. Southeast Regional Conf., Melbourne, FL, 2006, pp. 655–660.

  4. 4

    Collberg, S. and Thomborson, C., “Watermarking, tamper-proofing, and obfuscation-tools for software protection, IEEE Trans. Software Eng., 2002, vol. 28, pp. 735–746.

  5. 5

    Udupa, S.K., Debray, K., and Madou, M., Deobfuscation: reverse engineering obfuscated code, Proc. 16th Conf. on Reverse Engineering, Lille, 2009, pp. 10–19.

  6. 6

    Palsberg, J., Krishnaswamy, S., and Kwon, M., Experience with software watermarking, Proc. 19th Computer Security Applications Conf., Las Vegas, 2003, pp. 308–316.

  7. 7

    Bai, Y., Sun, X., Sun, G., Deng, X., and Zhou, X., “Dynamic k-gram based software birthmark, Proc. 19th Australian Conf. on Software Engineering, Perth, 2008, pp. 644–649.

  8. 8

    Mahmood, Y., Pervez, Z., Sarwar, S., and Ahmed, H.F., Similarity level method based static software birthmarks, Proc. Int. Symp. on High Capacity Optical Networks and Enabling Technologies, Penang, 2008, pp. 205–210.

  9. 9

    Schuler, D., Dallmeier, V., and Lindig, C., A dynamic birthmark for Java, Proc. 22nd IEEE/ACM Int. Conf. on Automated Software Engineering, Atlanta, 2007, pp. 274–283.

  10. 10

    Nazir, S., Shahzad, S., Khan, S.A., Alias, N.B., and Anwar, S., A novel rules based approach for estimating software birthmark, Sci. World J., 2015, vol. 2015, art. ID 579390.

  11. 11

    Jorge, E.N., Pirmez, L., Costa, O., Boccardo, R., and Bento, M., Tiny watermark: a code obfuscation-based software watermarking framework for wireless sensor networks, Proc. Int. Conf. on Wireless Networks, ICWN’14, Las Vegas, 2014.

  12. 12

    Nayakoji, N. and Sonavane, S., JavaScript theft detection using birthmark and subgraph isomorphism, J. Eng. Comput. Appl. Sci., 2014, vol. 3, pp. 1–5.

  13. 13

    Che, S. and Wang, Y., A software watermarking based on PE file with tamper-proof function, TELKOMNIKA Indones. J. Electron. Eng., 2014, vol. 12, pp. 1012–1021.

  14. 14

    Zhu, F., Concepts and Techniques in Software Watermarking and Obfuscation, Auckland: Research Space, 2007.

  15. 15

    Myles, G. and Collberg, C., Detecting software theft via whole program path birthmarks, in Information Security, New York: Springer-Verlag, 2004, pp. 404–415.

  16. 16

    Myles, G. and Collberg, C., K-gram based software birthmarks, Proc. 2005 ACM Symp. on Applied Computing, Santa Fe, 2005, pp. 314–318.

  17. 17

    Tian, Z., Liu, T., and Zheng, Q., A new thread-aware birthmark for plagiarism detection of multithreaded programs, Proc. 38th Int. Conf. on Software Engineering Companion, Austin, 2016, pp. 734–736.

  18. 18

    Tian, Z., Liu, T., and Zheng, Q., Exploiting thread-related system calls for plagiarism detection of multithreaded programs, J. Syst. Software, 2016, vol. 119, 136–148.

  19. 19

    Bhattacharya, S., Survey on digital watermarking—a digital forensics & security application, Int. J. Piracy Control, 2014, vol. 4.

  20. 20

    Khan, A., Siddiqa, A., and Munib, S., A recent survey of reversible watermarking techniques, Inf. Sci., 2014, vol. 279, pp. 251–272.

  21. 21

    Zhou, W., Zhang, X., and Jiang, X., AppInk: watermarking android apps for repackaging deterrence, Proc. 8th ACM SIGSAC Symp. on Information, Computer and Communications Security, Hangzhou, 2013, pp. 1–12.

  22. 22

    Ren, C., Chen, K., and Liu, P., Droidmarking: resilient software watermarking for impeding android application repackaging, Proc. 29th ACM/IEEE Int. Conf. on Automated Software Engineering, Vasteras, 2014, pp. 635–646.

  23. 23

    Sun, G., Fan, X., Fu, S., Song, X., and Luo, H., Software watermarking in the cloud: analysis and rigorous theoretic treatment, J. Software Eng., 2015, vol. 9, pp. 410–418.

  24. 24

    Rubini, P. and Leela, S., A survey on plagiarism detection in text mining, Int. J. Res. Comput. Appl. Rob., 2013, vol. 1, p. 117.

  25. 25

    Oberreuter, G. and VelaSquez, J.D., Text mining applied to plagiarism detection: the use of words for detecting deviations in the writing style, Exp. Syst. Appl., 2013, vol. 40, pp. 3756–3763.

  26. 26

    Rana, H. and Stamp, M., Hunting for pirated software using metamorphic analysis, Inf. Secur. J. Global Perspect., 2014, vol. 23, pp. 68–85.

  27. 27

    Costa, M. and Gong, Z., Web structure mining: an introduction, Proc. IEEE Int. Conf. on Software Security Information Acquisition, Piscataway, NJ: Inst. Electr. Electron. Eng., 2005, p. 6.

  28. 28

    Vemparala, S., Di Troia, F., Corrado, V., Austin, H., and Stamo, M., Malware detection using dynamic birthmarks, Proc. ACM Int. Workshop on Security and Privacy Analytics, New Orleans, 2016, pp. 41–46.

  29. 29

    Zeng, K. and Athanas, P., A q-gram birthmarking approach to predicting reusable hardware, Proc. Design, Automation & Test in Europe Conf. & Exhibition (DATE), Dresden, 2016, pp. 1112–1115.

  30. 30

    Bogdanov, P., Baumer, B., and Basu, P., As strong as the weakest link: mining diverse cliques in weighted graphs, in Machine Learning and Knowledge Discovery in Databases, New York: Springer-Verlag, 2013, pp. 525–540.

  31. 31

    Getoor, L. and Diehl, P., Link mining: a survey, ACM SIGKDD Explor. Newslett., 2005, vol. 7, pp. 3–12.

  32. 32

    Kavitha, D., Rao, M., and Babu, K., A survey on assorted approaches to graph data mining, Int. J. Comput. Appl., 2011, vol. 14, pp. 43–46.

  33. 33

    Tamada, H., Nakamura, M., and Monden, A., Java birthmarks—detecting the software theft, IEICE Trans. Inf. Syst., 2005, vol. 88, pp. 2148–2158.

  34. 34 Accessed November 2, 2017.

  35. 35

    Fan, M., Liu, J., Luo, X., et al., Android malware familial classification and representative sample selection via frequent sub-graph analysis, IEEE Trans. Inf. Forensics Secur., 2018, vol. 13, pp. 1890–1905.

  36. 36

    Tian, Z., Liu, T., Zheng, Q., et al., Reviving sequential program nirthmarking for multithreaded software plagiarism detection, IEEE Trans. Software Eng., 2018, vol. 44, pp. 491–511.

Download references

Author information

Correspondence to S. Sarwar or Z. Ul. Qayyum or M. Safyan or M. Iqbal or Y. Mahmood.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Sarwar, S., Qayyum, Z.U., Safyan, M. et al. Graphs Resemblance based Software Birthmarks through Data Mining for Piracy Control. Program Comput Soft 45, 581–589 (2019).

Download citation