Skip to main content

Polyhedral Optimization of TensorFlow Computation Graphs

  • Conference paper
  • First Online:
Programming and Performance Visualization Tools (ESPT 2017, ESPT 2018, VPA 2017, VPA 2018)

Abstract

We present \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\), a polyhedral optimization tool for neural network computations. \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\) transforms computations performed in a neural network graph into C programs suited to the polyhedral representation and uses R-Stream, a polyhedral compiler, to parallelize and optimize the computations performed in the graph. \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\) can exploit the optimizations available with R-Stream to generate a highly optimized version of the computation graph, specifically mapped to the targeted architecture. During our experiments, \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\) was able to automatically reach performance levels close to the hand-optimized implementations, demonstrating its utility in porting neural network computations to parallel architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Raising generic C codes into a polyhedral representation can be a complex problem, when programmers or library writers have made manual optimizations (e.g., parallelization, tiling, ...) based on domain knowledge which cannot be easily inferred from their program source. Such manual optimizations are often not performance portable (or portable at all) to new platforms, thus their action of performing manual optimization “bakes the code” to that one original target. To re-optimize to a new architecture through the polyhedral model, such manual optimizations often have to be reverted to produce an efficient polyhedral representation of the program. Unknown aliasing and overflowing arithmetic are among the challenges of such “un-baking.” With modern compiler tools like R-Stream now available, it would a much more sustainable practice for programmers to express their code originally in a high-level, domain-specific manner.

References

  1. Khandekar, R., et al.: COLA: optimizing stream processing applications via graph partitioning. In: Bacon, J.M., Cooper, B.F. (eds.) Middleware 2009. LNCS, vol. 5896, pp. 308–327. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10445-9_16

    Chapter  Google Scholar 

  2. Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2015). http://download.tensorflow.org/paper/whitepaper2015.pdf

  3. Atasu, K., Pozzi, L., Ienne, P.: Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proceedings of the 40th Annual Design Automation Conference, pp. 256–261. ACM (2003)

    Google Scholar 

  4. Baghdadi, R., et al.: PENCIL: a platform-neutral compute intermediate language for accelerator programming. In: Parallel Architectures and Compilation Techniques (PACT), San Francisco, CA, USA (2015). https://hal.archives-ouvertes.fr/hal-01257236

  5. Bastoul, C., et al.: System, methods and apparatus for program optimization for multi-threaded processor architectures, April 2010

    Google Scholar 

  6. Bergstra, J., et al.: Theano: a CPU and GPU math compiler in Python (2011)

    Google Scholar 

  7. Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Hendren, L. (ed.) CC 2008. LNCS, vol. 4959, pp. 132–146. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78791-4_9

    Chapter  Google Scholar 

  8. Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015). http://arxiv.org/abs/1512.01274

  9. Clauss, P., Meister, B.: Automatic memory layout transformation to optimize spatial locality in parameterized loop nests. ACM SIGARCH Comput. Archit. News 28(1), 11–19 (2000)

    Article  Google Scholar 

  10. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop No. EPFL-CONF-192376 (2011)

    Google Scholar 

  11. Dayarathna, M., Suzumura, T.: Automatic optimization of stream programs via source program operator graph transformations. Distrib. Parallel Databases 31(4), 543–599 (2013). https://doi.org/10.1007/s10619-013-7130-x

    Article  Google Scholar 

  12. Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Program. 21(5), 313–348 (1992). citeseer.ist.psu.edu/feautrier92some.html

  13. Grosser, T., Groesslinger, A., Lengauer, C.: Polly-performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(04), 1250010 (2012)

    Article  MathSciNet  Google Scholar 

  14. Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of programming languages, pp. 319–329. ACM Press, New York, January 1988

    Google Scholar 

  15. Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding (2014)

    Google Scholar 

  16. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  17. Lethin, R.A., Leung, A.K., Meister, B.J., Vasilache, N.T.: Methods and apparatus for joint parallelism and locality optimization in source code compilation, September 2009

    Google Scholar 

  18. Meister, B., Baskaran, M.M., Pradelle, B., Henretty, T., Lethin, R.: Efficient compilation to event-driven task programs. CoRR abs/1601.05458 (2016). http://arxiv.org/abs/1601.05458

  19. Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-stream compiler. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4

    Chapter  Google Scholar 

  20. Meister, B., et al.: SANE: an array language for sensor applications. In: Proceedings of a Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, Salt Lake City, UT, USA, November 16 2012

    Google Scholar 

  21. Pop, S., et al.: GRAPHITE: loop optimizations based on the polyhedral model for GCC. In: Proceedings of the 4th GCC Developper’s Summit, pp. 179–198. Ottawa, Canada, June 2006

    Google Scholar 

  22. Pradelle, B., Meister, B., Baskaran, M.M., Konstantinidis, A., Henretty, T., Lethin, R.: Scalable hierarchical polyhedral compilation. In: International Conference on Parallel Processing (ICPP) (2016)

    Google Scholar 

  23. Vasilache, N., et al.: A tale of three runtimes. arXiv:1409.1914

  24. Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Proceedings of the Third international congress conference on Mathematical software (ICMS 2010), pp. 299–302. ACM Press (2010)

    Google Scholar 

  25. Verdoolaege, S., Janssens, G.: Scheduling for PPCG. Technical report, Department of Computer Science, KU Leuven (2017)

    Google Scholar 

  26. Verdoolaege, S., Juega, J.C., Cohen, A., Gómez, J.I., Tenllado, C., Catthoor, F.: Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim. 9(4), 54:1–54:23 (2013)

    Article  Google Scholar 

  27. Yu, D., Yao, K., Zhang, Y.: The computational network toolkit [best of the web]. IEEE Signal Process. Mag. 32(6), 123–126 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of the Defense Advanced Research Projects Agency or the U.S. Government. This document was cleared by DARPA on August 23, 2017. Distribution Statement “A” (Approved for Public Release, Distribution Unlimited).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muthu Baskaran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pradelle, B., Meister, B., Baskaran, M., Springer, J., Lethin, R. (2019). Polyhedral Optimization of TensorFlow Computation Graphs. In: Bhatele, A., Boehme, D., Levine, J., Malony, A., Schulz, M. (eds) Programming and Performance Visualization Tools. ESPT ESPT VPA VPA 2017 2018 2017 2018. Lecture Notes in Computer Science(), vol 11027. Springer, Cham. https://doi.org/10.1007/978-3-030-17872-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-17872-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-17871-0

  • Online ISBN: 978-3-030-17872-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics