Skip to main content

Portable Intermediate Representation for Efficient Big Data Analytics

  • 320 Accesses

Part of the Lecture Notes in Computer Science book series (LNCCN,volume 12718)


To process big data, applications have been utilizing data processing libraries over the last years, which are however not optimized to work together for efficient processing. Intermediate Representations (IR) have been introduced for unifying essential functions into an abstract interface that supports cross-optimization between applications. Still, the efficiency of an IR depends on the architecture and the tools required for compilation and execution. In this paper, we present a first glance at a framework that provides an IR by creating containers with executable code from structures of data analytics functions, described in an input grammar. These containers process data in query lists and they can be executed either standalone or integrated with other big data analytics applications without the need to compile the entire framework.

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions


  1. 1.

  2. 2.


  1. Zhao, J., Nagarakatte, S., Martin, M.M., Zdancewic, S.: Formalizing the llVM intermediate representation for verified program transformations. In: Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 427–440 (2012)

    Google Scholar 

  2. Soliman, M.A., Ilyas, I.F., Chang, K.C.C.: Top-k query processing in uncertain databases. In: 2007 IEEE 23rd International Conference on Data Engineering, pp. 896–905. IEEE (2007)

    Google Scholar 

  3. Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003).

    CrossRef  Google Scholar 

  4. Lin, X.C., et al.: Serverless boom or bust? An analysis of economic incentives. In: USENIX (2020)

    Google Scholar 

  5. Lattner, C., Adve, V.: LLVM: a compilation framework for lifelong program analysis & transformation. In: International Symposium on Code Generation and Optimization, 2004. CGO 2004, pp. 75–86. IEEE (2004)

    Google Scholar 

  6. Stone, J.E., Gohara, D., Shi, G.: OpenCL: A parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)

    CrossRef  Google Scholar 

  7. Palkar, S., et al.: Weld: a common runtime for high performance data analytics. In: Conference on Innovative Data Systems Research (CIDR), pp. 45 (2017)

    Google Scholar 

  8. Meijer, E., Beckman, B., Bierman, G.: LINQ: reconciling object, relations and xml in the. net framework. In: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pp. 706–706 (2006)

    Google Scholar 

  9. Marquet, K., Moy, M.: PinaVM: a SystemC front-end based on an executable intermediate representation. In: Proceedings of the tenth ACM international conference on Embedded software, pp. 79–88 (2010)

    Google Scholar 

  10. Black, D.C., Donovan, J., Bunton, B., Keist, A.: SystemC: From the Ground Up, vol. 71. Springer Science & Business Media, Heidelberg (2009)

    Google Scholar 

  11. Ragan-Kelley, J., et al.: Halide: decoupling algorithms from schedules for high-performance image processing. Commun. ACM 61(1), 106–115 (2017)

    CrossRef  Google Scholar 

  12. Vatavu, A., Nedevschi, S.: Real-time modeling of dynamic environments in traffic scenarios using a stereo-vision system. In: 2012 15th International IEEE Conference on Intelligent Transportation Systems, pp. 722–727. IEEE (2012)

    Google Scholar 

  13. Bozga, M., et al.: If: an intermediate representation for SDL and its applications. In: SDL 1999, pp. 423–440. Elsevier (1999)

    Google Scholar 

  14. Neumann, T.: Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endowment 4(9), 539–550 (2011)

    CrossRef  Google Scholar 

  15. Klonatos, Y., Koch, C., Rompf, T., Chafi, H.: Building efficient query engines in a high-level language. Proc. VLDB Endowment 7(10), 853–864 (2014)

    CrossRef  Google Scholar 

  16. Docker.

  17. Brewer, E.A.: Kubernetes and the path to cloud native. In: Proceedings of the sixth ACM symposium on cloud computing, pp. 167–167 (2015)

    Google Scholar 

  18. Agache, A., et al.: Firecracker: lightweight virtualization for serverless applications. In: 17th \(\{\)usenix\(\}\) symposium on networked systems design and implementation (\(\{\)nsdi\(\}\) 20), pp. 419–434 (2020)

    Google Scholar 

Download references


This research has been supported by the European Union through the H2020 952215 TAILOR project.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Vana Kalogeraki .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tzouros, G., Tsenos, M., Kalogeraki, V. (2021). Portable Intermediate Representation for Efficient Big Data Analytics. In: Matos, M., Greve, F. (eds) Distributed Applications and Interoperable Systems. DAIS 2021. Lecture Notes in Computer Science(), vol 12718. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-78197-2

  • Online ISBN: 978-3-030-78198-9

  • eBook Packages: Computer ScienceComputer Science (R0)