Skip to main content
Log in

ABSLearn: a GNN-based framework for aliasing and buffer-size information retrieval

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Inferring aliasing and buffer-size information is important to understanding a C program's memory layout, which is critical to program analysis and security-related tasks. However, traditional static and dynamic program analysis methods suffer from certain limitations: static alias analysis methods suffer from precision loss and have poor scalability. Meanwhile, although dynamic analysis can achieve high precision, there is no soundness guarantee, and an online analysis may cause non-negligible runtime overhead. Besides, the current methods can only capture aliasing information. As for the buffer-size relational information, which is the specific variable storing the size of the buffer pointed by the pointers, it is tough to analyze by traditional methods. Moreover, we observe that most methods are designed for specific information. To address these limitations, we present ABSLearn, a deep learning framework that is capable of retrieving both aliasing and buffer-size information from C programs. The core idea of ABSLearn is to formulate the information retrieval as a link prediction problem, where a Graph Neural Network (GNN) model is applied to solve the problem. We developed the first related dataset that contains 285 C program samples to train ABSLearn. Then, the trained model is applied to infer the information on three practical benchmarks: Gzip-1.2.4, Make-3.80, and Tar-1.15.1. The results show that ABSLearn achieves comparable performance and excellent runtime performance. As the first attempt at applying GNN to infer aliasing and buffer-size information, ABSLearn can potentially benefit future program analysis frameworks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The data used to support the experiments and findings of this study will be made available upon publication of the article.

Notes

  1. The C library function void *realloc(void *ptr, size_t size) attempts to resize the memory block pointed to by ptr that was previously allocated with a call to malloc or calloc.

References

  1. Balakrishnan G, Reps T (2004) Analyzing Memory Accesses in x86 Executables. In: Duesterwald E (ed) Compiler construction. CC 2004. Lecture notes in computer science, vol 2985. Springer, Heidelberg. https://doi.org/10.1007/978-3-540-24723-4_2

    Chapter  Google Scholar 

  2. Evans I, Long F, Otgonbaatar U, Shrobe H, Rinard M, Okhravi H, Sidiroglou-Douskos S (2015) Control jujutsu: on the weaknesses of fine-grained control flow integrity. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, New York, p 901–913. https://doi.org/10.1145/2810103.2813646

  3. Zeng D, Tan G (2018) From debugging-information based binary-level type inference to CFG generation. In: Proceedings of the eighth ACM conference on data and application security and privacy. Association for Computing Machinery, New York, p 366–376.https://doi.org/10.1145/3176258.3176309

  4. Lu K, Hu H (2019) Where does it go? Refining indirect-call targets with multi-layer type analysis. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security. Association for Computing Machinery, p 1867–1881. https://doi.org/10.1145/3319535.3354244

  5. Kim SH, Sun C, Zeng D, Tan G (2021) Refining indirect call targets at the binary level. Netw Distrib Syst Secur Symp. https://doi.org/10.14722/ndss.2021.24386

    Article  Google Scholar 

  6. Abadi M, Budiu M, Erlingsson Ú, Ligatti J (2009) Control-flow integrity principles, implementations, and applications. ACM Trans Inf Syst Secur 13(1):40. https://doi.org/10.1145/1609956.1609960

    Article  Google Scholar 

  7. Zhang C, Wei T, Chen Z, Duan L, Szekeres L, McCamant S, Song D, Zou W (2013) Practical control flow integrity and randomization for binary executables. In: IEEE symposium on security and privacy (S&P). pp 559–573. https://doi.org/10.1109/SP.2013.44

  8. Zhang M, Sekar R (2013) Control flow integrity for COTS binaries. In: 22nd Usenix security symposium. pp 337–352. https://doi.org/10.5555/2534766.2534796

  9. Tice C, Roeder T, Collingbourne P, Checkoway S, Erlingsson Ú, Lozano L, Pike G (2014) Enforcing forward-edge control-flow integrity in GCC & LLVM. In: Proceedings of the 23rd USENIX conference on Security Symposium (SEC'14). USENIX Association, USA, pp 941–955. https://doi.org/10.5555/2671225.2671285

  10. Niu B, Tan G (2014) Modular control-flow integrity. In: SIGPLAN Not. 49, 6 (June 2014), pp 577–587. https://doi.org/10.1145/2666356.2594295

  11. Niu B, Tan G (2015) Per-Input control-flow integrity. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security (CCS '15). 914–926. https://doi.org/10.1145/2810103.2813644

  12. van der Veen V, Andriesse D, Göktaş E, Gras B, Sambuc L, Slowinska A, Bos H, Giuffrida C (2015) Practical context-sensitive CFI. In: Proceedings of the 22nd ACM SIGSAC conference on computer and communications security (CCS '15). Association for Computing Machinery, New York, pp 927–940. https://doi.org/10.1145/2810103.2813673

  13. Ge X, Talele N, Payer M, Jaeger T (2016) Fine-grained control-flow integrity for kernel software. In: IEEE European symposium on security and privacy (EuroS&P), pp 179–194. https://doi.org/10.1109/EuroSP.2016.24

  14. van der Veen V, G ̈oktas E, Contag M, Pawoloski A, Chen X, Rawat S, Bos H, Holz T, Athanasopoulos E, Giuffrida C (2016) A tough call: Mitigating advanced code-reuse attacks at the binary level,” in IEEE Symposium on Security and Privacy (S&P). pp. 934–953. https://doi.org/10.1109/SP.2016.60

  15. Ding R, Qian C, Song C, Harris B, Kim T, Lee W (2017) Efficient protection of path-sensitive control security. In: 26th Usenix security symposium, pp 131–148. https://doi.org/10.5555/3241189.3241201

  16. Khandaker M, Naser A, Liu W, Wang Z, Zhou Y, Cheng Y (2019) Adaptive call-site sensitive control flow integrity. In: IEEE european symposium on security and privacy (EuroS&P), pp 95–110. https://doi.org/10.1109/EuroSP.2019.00017

  17. Khandaker MR, Liu W, Naser A, Wang Z, Yang J (2019) Origin-sensitive control flow integrity. In: 28th Usenix security symposium, pp 195–211. https://doi.org/10.5555/33.3361338.3361353

  18. Nagarakatte S, Zhao J, Martin MMK, Zdancewic S (2009) Soft-bound: highly compatible and complete spatial memory safety for C. In: ACM conference on programming language design and implementation (PLDI), pp 245–258. https://doi.org/10.1145/1543135.1542504

  19. Xu S, Huang W, Lie D (2021) In-fat pointer: hardware-assisted tagged-pointer spatial memory safety defense with subobject granularity protection. In: Proceedings of the 26th ACM international conference on architectural support for programming languages and operating systems, 224–240. https://doi.org/10.1145/3445814.3446761

  20. Serebryany K, Bruening D, Potapenko A, Vyukov D (2012) Addresssanitizer: A fast address sanity checker. In: USENIX ATC. https://doi.org/10.5555/2342821.2342849

  21. Zdancewic S, Zheng L, Nystrom N, Myers AC (2002) Secure program partitioning. ACM Trans Comput Syst. https://doi.org/10.1145/566340.566343

    Article  Google Scholar 

  22. Zheng L, Chong S, Myers A, Zdancewic S (2003) Using replication and partitioning to build secure distributed systems. In: IEEE symposium on security and privacy (S&P), pp 236–250. https://doi.org/10.1109/SECPRI.2003.1199340

  23. Provos N, Friedl M, Honeyman P (2003) Preventing privilege escalation. In: 12th Usenix security symposium, pp 231–242. https://doi.org/10.5555/1251353.1251369

  24. Kilpatrick D (2003) Privman: a library for partitioning applications. In: USENIX annual technical conference, pp 273–284

  25. Brumley D, Song D (2004) Privtrans: Automatically partitioning programs for privilege separation. In: 13th Usenix Security Symposium, pp 57–72. https://doi.org/10.5555/1251375.1251380

  26. Chong S, Liu J, Myers A, Qi X, Vikram K, Zheng L, Zheng X (2007) Secure web applications via automatic partitioning. In: ACM SIGOPS symposium on operating systems principles (SOSP). pp 31–44. https://doi.org/10.1145/1323293.1294265

  27. Bittau A, Marchenko P, Handley M, Karp B (2008) Wedge: splitting applications into reduced-privilege compartments. In: Proceedings of the 5th USENIX symposium on networked systems design and implementation. pp 309–322. https://doi.org/10.5555/1387589.1387611

  28. Krishnamurthy A, Mettler A, Wagner D (2010) Fine-grained privilege separation for web applications. In: Proceedings of the 19th international conference on world wide web, pp 551–560. https://doi.org/10.1145/1772690.1772747

  29. Niu B, Tan G (2012) Enforcing user-space privilege separation with declarative architectures. In: Proceedings of the sixth ACM workshop on scalable trusted computing (STC), pp 9–20. https://doi.org/10.1145/2382536.2382541

  30. Dong X, Hu H, Saxena P, Liang Z (2013) A quantitative evaluation of privilege separation in web browser designs. In: 18th european symposium on research in computer security (ESORICS), pp 75–93. https://doi.org/10.1145/3319535.3354218

  31. Wu Y, Sun J, Liu Y, Dong JS (2013) Automatically partition software into least privilege components using dynamic data dependency analysis. In: International conference on automated software engineering (ASE). pp 323–333. https://doi.org/10.1109/ASE.2013.6693091.

  32. Liu Y, Zhou T, Chen K, Chen H, Xia Y (2015) Thwarting memory disclosure with efficient hypervisor-enforced intra-domain isolation. In: 22nd ACM conference on computer and communications security (CCS). pp 1607–1619. https://doi.org/10.1145/2810103.2813690

  33. Rubinov K, Rosculete L, Mitra T, Roychoudhury A (2016) Automated partitioning of Android applications for trusted execution environments. In: International conference on software engineering (ICSE). pp 923–934. https://doi.org/10.1145/2884781.2884817

  34. Jacobsen C, Khole M, Spall S, Bauer S, Burtsev A (2016) Lightweight capability domains: Towards decomposing the linux kernel. SIGOPS Oper Syst Rev. https://doi.org/10.1145/2883591.2883601

    Article  Google Scholar 

  35. Mambretti A, Onarlioglu K, Mulliner C, Robertson W, Kirda E, Maggi F, Zanero S (2016) Trellis: privilege separation for multi-user applications made easy. In: International symposium on research in attacks, intrusions and Defenses (RAID), pp. 437–456. https://doi.org/10.1007/978-3-319-45719-2_20

  36. Lind J, Priebe C, Muthukumaran D, O’Keeffe D, Aublin P, Kelbert F, Reiher T, Goltzsche D, Eyers DM, Kapitza R, Fetzer C, Pietzuch PR (2017) Glamdring: automatic application partitioning for intel SGX. In: USENIX annual technical conference (ATC). pp. 285–298. https://dl.acm.org/doi/https://doi.org/10.5555/3154690.3154718

  37. Liu S, Tan G, Jaeger T (2017) PtrSplit: supporting general pointers in automatic program partitioning. In 24th ACM conference on computer and communications security (CCS), pp 2359–2371. https://doi.org/10.1145/3133956.3134066

  38. Liu S, Zeng D, Huang Y, Capobianco F, McCamant S, Jaeger T, Tan G (2019) Program-mandering: Quantitative privilege separation. In: 26th ACM conference on computer and communications security (CCS)., pp 1023–1040. https://doi.org/10.1145/3319535.3354218

  39. Hind M (2001) Pointer analysis: haven’t we solved this problem yet? In: ACM SIGPLAN/SIGSOFT workshop on program analysis for software tools and engineering.https://doi.org/10.1145/379605.379665

  40. Zhang Q, Xiao X, Zhang C, Yuan H, Su Z (2014) Efficient subcubic alias analysis for c. In: ACM SIGPLAN notices. pp 829–845. https://doi.org/10.1145/2714064.2660213

  41. Kroes T, Koning K, van der Kouwe E, Bos H, Giuffrida C (2018) Delta pointers: Buffer overflow checks without the checks. In: Proceedings of the thirteenth eurosys conference, ser. EuroSys 18. https://doi.org/10.1145/3190508.3190553

  42. Ben-Nun T, Jakobovits AS, Hoefler T (2018) Neural code comprehension: A learnable representation of code semantics. In: NeurIPS. https://doi.org/10.5555/3327144.3327276

  43. Liang K, Wu S, Gu J (2021) MKA: A scalable medical knowledge assisted mechanism for generative models on medical conversation tasks. In: Computational and mathematical methods in medicine. pp. 148–168. https://doi.org/10.1155/2021/5294627.

  44. Teru KK, Denis E, Hamilton WL (2020) Inductive relation prediction by subgraph reasoning. ICML. https://doi.org/10.5555/3524938.3525814

    Article  Google Scholar 

  45. Kumar A, Singh SS, Singh K, Biswas B (2020) Link prediction techniques, applications, and performance: a survey. Phys A Stat Mech Appl 553:124289. https://doi.org/10.1016/j.physa.2020.124289

    Article  MathSciNet  MATH  Google Scholar 

  46. Zhou T, Liu L, Zhang YC (2009) Predicting missing links via local information. Eur Phys J 71:623–630. https://doi.org/10.1140/EPJB/E2009-00335-8

    Article  MATH  Google Scholar 

  47. Adamic L, Adar E (2001) Friends and neighbors on the web. Social Netw 25:211–230. https://doi.org/10.1016/S0378-8733(03)00009-1

    Article  Google Scholar 

  48. Jaccard P (1901) Distribution de la flore alpine dans le Bassin des Dranses et dans quelques regions voisines. Bulletin de la Société Vaudoise des Sciences Naturelles 37:241–272

    Google Scholar 

  49. Barabasi A, Jeong H, Neda Z, Ravasz E, Schubert A, Vicsek T (2002) Evolution of the social network of scientific collaborations. Phys A Stat Mech Appl 311:590–614. https://doi.org/10.1016/S0378-4371(02)00736-7

    Article  MathSciNet  MATH  Google Scholar 

  50. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18(1):39–43. https://doi.org/10.1007/BF02289026

    Article  MATH  Google Scholar 

  51. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inform Sci Technol 58(7):1019–1031. https://doi.org/10.1002/asi.20591

    Article  Google Scholar 

  52. Tong H, Faloutsos C, Pan JY (2006) Fast random walk with restart and its applications. In: Sixth international conference on data mining, pp 613–622. IEEE. Doi: https://doi.org/10.1109/ICDM.2006.70

  53. Clauset A, Moore C, Newman M (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101. https://doi.org/10.1038/nature06830

    Article  Google Scholar 

  54. Guimerà R, Sales-Pardo M (2009) Missing and spurious interactions and the reconstruction of complex networks. Proc Natl Acad Sci 106(52):22073–22078. https://doi.org/10.1073/pnas.0908366106

    Article  Google Scholar 

  55. Stanley N, Bonacci T, Kwitt R, Niethammer M, Mucha PJ (2019) Stochastic block models with multiple continuous attributes. Appl Netw Sci 4:1–22. https://doi.org/10.1007/s41109-019-0170-z

    Article  Google Scholar 

  56. Vallès-Català T, Peixoto TP, Sales-Pardo M, Guimerà R (2018) Consistencies and inconsistencies between model selection and link prediction in networks. Phys Review E 97(6–1):062316. https://doi.org/10.1103/PhysRevE.97.062316

    Article  Google Scholar 

  57. Kuo TT, Yan R, Huang YY, Kung PH, Lin SD (2013) Unsupervised link prediction using aggregative statistics on heterogeneous social networks. In Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’13, Association for Computing Machinery, New York, pp 775–783. https://doi.org/10.1145/2487575.2487614

  58. Yang J, McAuley J, Leskovec J (2013) Community detection in networks with node attributes. In: IEEE 13th international conference on data mining, pp 1151–1156. doi: https://doi.org/10.1109/ICDM.2013.167

  59. Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ser. KDD ’14. ACM, New York, pp 701–710. https://doi.org/10.1145/2623330.2623732

  60. Grover A, Leskovec J (2016) Node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining , ser. KDD ’16. Association for Computing Machinery, New York, p 855–864. https://doi.org/10.1145/2939672.2939754

  61. Ribeiro L F, Saverese P H, Figueiredo D R (2017) Struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining , ser. KDD ’17. ACM, New York, pp 385–394. https://doi.org/10.1145/3097983.3098061

  62. Perozzi B, Kulkarni V, Chen H, Skiena S (2017) Don't Walk, Skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, pp 258–265. https://doi.org/10.1145/3110025.3110086

  63. Schlichtkrull M, Kipf T N, Bloem P, Berg R V D, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, Springer, Cham, pp 593–607

  64. Hind M (2001) Pointer analysis: haven't we solved this problem yet?. In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT workshop on program analysis for software tools and engineering, pp 54–61. https://doi.org/10.1145/379605.379665

  65. Smaragdakis Y, Balatsouras G (2015) Pointer analysis. Found Trends Program Lang 2(1):1–69. https://doi.org/10.1561/2500000014

    Article  Google Scholar 

  66. Sui Y, Xue J (2016) SVF: interprocedural static value-flow analysis in LLVM. In: Proceedings of the 25th international conference on compiler construction, pp 265–266. https://doi.org/10.1145/2892208.2892235

  67. Gurfinkel A, Navas J A (2017) A context-sensitive memory model for verification of C/C++ programs. In: International static analysis symposium. Springer, pp 148–168. https://doi.org/10.1007/978-3-319-66706-5_8

  68. Xu W, DuVarney D C, Sekar R (2004) An efficient and backwards-compatible transformation to ensure memory safety of C programs. In: Proceedings of the 12th ACM SIGSOFT twelfth international symposium on foundations of software engineering, pp 117–126. https://doi.org/10.1145/1029894.1029913

  69. Blanchet B, Cousot P, Cousot R, Feret J, Mauborgne L, Miné A, Monniaux D, Rival X (2003) A static analyzer for large safety-critical software. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation, pp 196–207. https://doi.org/10.1145/781131.781153

  70. Dor N, Rodeh M, Sagiv M (2003) CSSV: Towards a realistic tool for statically detecting all buffer overflows in C. In: Proceedings of the ACM SIGPLAN 2003 conference on programming language design and implementation, pp 155–167. https://doi.org/10.1145/781131.781149

  71. Nethercote N, Fitzhardinge J (2004) Bounds-checking entire programs without recompiling. SPACE

  72. Narayanan V, Huang Y, Tan G, Jaeger T, Burtsev A (2020) Lightweight kernel isolation with virtualization and VM functions. In: Proceedings of the 16th ACM SIGPLAN/SIGOPS international conference on virtual execution environments, pp 157–171. https://doi.org/10.1145/3381052.3381328

  73. Ferrante J, Ottenstein KJ, Warren JD (1987) The program dependence graph and its use in optimization. ACM Trans Program Lang Syst 9:319–349. https://doi.org/10.1145/24039.24041

    Article  MATH  Google Scholar 

  74. Gilmer J, Schoenholz S S, Riley P F, Vinyals O, Dahl G E (2017) Neural message passing for quantum chemistry. In: International conference on machine learning, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. I, Eds., vol 70. PMLR, 06–11 Aug 2017, pp 1263–1272. https://doi.org/10.5555/3305381.3305512

  75. Xu K, Hu W, Leskovec J, Jegelka S (2018) How powerful are graph neural networks? In: International conference on learning representations

  76. Schlichtkrull M, Kipf T N, Bloem P, Berg R V D, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, pp 593–607. Springer, Cham. https://doi.org/10.1007/978-3-319-93417-4_38

  77. Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2018) Graph attention networks.In: International conference on learning representations (ICLR). https://arxiv.org/abs/1710.10903

  78. Andersen LO (1994) Program analysis and specialization for the C programming language. University of Copenhagen, DIKU

    Google Scholar 

  79. Liang K, Wu S, Gu J (2021) MKA: A scalable medical knowledge-assisted mechanism for generative models on medical conversation tasks. Comput Math Methods Med. https://doi.org/10.1155/2021/5294627

    Article  Google Scholar 

  80. Chen J, He H, Wu F, Wang J (2021) Topology-aware correlations between relations for inductive link prediction in knowledge graphs. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol 35, No 7, pp 6271–6278

  81. Mai S, Zheng S, Yang Y, Hu H (2021) Communicative message passing for inductive relation reasoning. In AAAI, pp 4294–4302

  82. Liang K, Meng L, Liu M, Liu Y, Tu W, Wang S, Zhou S, Liu X, Sun F (2022) Reasoning over different types of knowledge graphs: static, temporal and multi-modal. https://doi.org/10.48550/arXiv.2212.05767

Download references

Funding

The authors did not have any specific funding related to this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Liang.

Ethics declarations

Conflict of interest

The authors declare that they have no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, K., Tan, J., Zeng, D. et al. ABSLearn: a GNN-based framework for aliasing and buffer-size information retrieval. Pattern Anal Applic 26, 1171–1189 (2023). https://doi.org/10.1007/s10044-023-01142-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-023-01142-2

Keywords

Navigation