Abstract
Dynamic program analysis encompasses the development of techniques and tools for analyzing computer software by exploiting information gathered from a program at runtime. The impressive amounts of data collected by dynamic analysis tools require efficient indexing and compression schemes, as well as on-line algorithmic techniques for mining relevant information on-the-fly in order to identify frequent events, hidden software patterns, or undesirable behaviors corresponding to bugs, malware, or intrusions. The paper explores how recent results in algorithmic theory for data-intensive scenarios can be applied to the design and implementation of dynamic program analysis tools, focusing on two important techniques: sampling and streaming.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ball, T.: The concept of dynamic analysis. In: Wang, J., Lemoine, M. (eds.) ESEC 1999 and ESEC-FSE 1999. LNCS, vol. 1687, pp. 216–234. Springer, Heidelberg (1999)
Cornelissen, B., Zaidman, A., van Deursen, A., Moonen, L., Koschke, R.: A systematic survey of program comprehension through dynamic analysis. IEEE Transactions on Software Engineering 35(5), 684–702 (2009)
Finkbeiner, B., Havelund, K., Rosu, G., Sokolsky, O.: Runtime verification, dagstuhl sem. 07011 executive summary. Technical report (2007)
Hamou-Lhadj, A., Lethbridge, T.: Measuring various properties of execution traces to help build better trace analysis tools. In: 10th IEEE Int. Conference on Engineering of Complex Computer Systems, pp. 559–568 (2005)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2005), pp. 190–200 (2005)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2007), pp. 89–100 (2007)
D’Elia, D.C., Demetrescu, C., Finocchi, I.: Mining hot calling contexts in small space. In: Proc. 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2011), pp. 516–527. ACM (2011)
Larus, J.R.: Whole program paths. In: ACM SIGPLAN Conference on Programming language design and implementation (PLDI 1999), pp. 259–269. ACM (1999)
Nevill-Manning, C.G., Witten, I.H.: Compression and explanation using hierarchical grammars. The Computer Journal 40(2/3), 103–116 (1997)
Nevill-Manning, C.G., Witten, I.H.: Linear-time, incremental hierarchy inference for compression. In: 7th Data Compression Conference (DCC 1997), pp. 3–11. IEEE Computer Society (1997)
Arnold, M., Ryder, B.G.: A framework for reducing the cost of instrumented code. SIGPLAN Not 36(5), 168–179 (2001)
Chan, A., Holmes, R., Murphy, G.C., Ying, A.T.T.: Scaling an object-oriented system execution visualizer through sampling. In: 11th Int. Workshop on Program Comprehension (IWPC 2003), pp. 237–244. IEEE Computer Society (2003)
Dugerdil, P.: Using trace sampling techniques to identify dynamic clusters of classes. In: Conference of the Center for Advanced Studies on Collaborative Research (CASCON 2007), pp. 306–314. IBM Corporation (2007)
Liblit, B., Aiken, A., Zheng, A.X., Jordan, M.I.: Bug isolation via remote program sampling. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2003), pp. 141–154. ACM (2003)
Pirzadeh, H., Shanian, S., Hamou-Lhadj, A., Alawneh, L., Shafiee, A.: Stratified sampling of execution traces: Execution phases serving as strata. Science of Computer Programming (2012) (in press)
Zhuang, X., Serrano, M.J., Cain, H.W., Choi, J.D.: Accurate, efficient, and adaptive calling context profiling. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2006), pp. 263–271. ACM (2006)
Coppa, E., Finocchi, I., Lo Re, D.: Reservoir profiling. Unpublished Manuscript (January 2013)
Mytkowicz, T., Diwan, A., Hauswirth, M., Sweeney, P.F.: Evaluating the accuracy of Java profilers. In: Proc. 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2010), pp. 187–197 (2010)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Marino, D., Musuvathi, M., Narayanasamy, S.: Literace: effective sampling for lightweight data-race detection. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2009), pp. 134–143 (2009)
Morris, R.: Counting large numbers of events in small registers. Comm. ACM 21(10), 840–842 (1978)
Munro, J., Paterson, M.: Selection and sorting with limited storage. Theoretical Computer Science 12(3), 315–323 (1980)
Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the frequency moments. Journal of Computer and System Sciences 58(1), 137–147 (1999)
Gilbert, A.C., Guha, S., Indyk, P., Kotidis, Y., Muthukrishnan, S., Strauss, M.J.: Fast, small-space algorithms for approximate histogram maintenance. In: Proceedings of the 34th Annual ACM Symposium on Theory of Computing, pp. 389–398 (2002)
Indyk, P.: Stable distributions, pseudorandom generators, embeddings, and data stream computation. J. ACM 53(3), 307–323 (2006)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 346–357 (2002)
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on Theory of Computing (STOC 2003), pp. 30–39 (2003)
Mysore, S., Agrawal, B., Sherwood, T., Shrivastava, N., Suri, S.: Profiling over adaptive ranges. In: IEEE/ACM Int. Symposium on Code Generation and Optimization (CGO 2006), pp. 147–158. IEEE Computer Society (2006)
Hershberger, J., Shrivastava, N., Suri, S., Tóth, C.D.: Adaptive spatial partitioning for multidimensional data streams. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 522–533. Springer, Heidelberg (2004)
Muthukrishnan, S.: Data streams: Algorithms and applications. Foundations and Trends in Theoretical Computer Science 1(2) (2005)
Metwally, A., Agrawal, D., Abbadi, A.E.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Finocchi, I. (2013). Software Streams: Big Data Challenges in Dynamic Program Analysis. In: Bonizzoni, P., Brattka, V., Löwe, B. (eds) The Nature of Computation. Logic, Algorithms, Applications. CiE 2013. Lecture Notes in Computer Science, vol 7921. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39053-1_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-39053-1_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39052-4
Online ISBN: 978-3-642-39053-1
eBook Packages: Computer ScienceComputer Science (R0)