Abstract
We present \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\), a polyhedral optimization tool for neural network computations. \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\) transforms computations performed in a neural network graph into C programs suited to the polyhedral representation and uses R-Stream, a polyhedral compiler, to parallelize and optimize the computations performed in the graph. \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\) can exploit the optimizations available with R-Stream to generate a highly optimized version of the computation graph, specifically mapped to the targeted architecture. During our experiments, \({\textsf {R}}\text {-}{\textsf {Stream}}{\cdot }{\textsf {TF}}\) was able to automatically reach performance levels close to the hand-optimized implementations, demonstrating its utility in porting neural network computations to parallel architectures.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Raising generic C codes into a polyhedral representation can be a complex problem, when programmers or library writers have made manual optimizations (e.g., parallelization, tiling, ...) based on domain knowledge which cannot be easily inferred from their program source. Such manual optimizations are often not performance portable (or portable at all) to new platforms, thus their action of performing manual optimization “bakes the code” to that one original target. To re-optimize to a new architecture through the polyhedral model, such manual optimizations often have to be reverted to produce an efficient polyhedral representation of the program. Unknown aliasing and overflowing arithmetic are among the challenges of such “un-baking.” With modern compiler tools like R-Stream now available, it would a much more sustainable practice for programmers to express their code originally in a high-level, domain-specific manner.
References
Khandekar, R., et al.: COLA: optimizing stream processing applications via graph partitioning. In: Bacon, J.M., Cooper, B.F. (eds.) Middleware 2009. LNCS, vol. 5896, pp. 308–327. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10445-9_16
Abadi, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2015). http://download.tensorflow.org/paper/whitepaper2015.pdf
Atasu, K., Pozzi, L., Ienne, P.: Automatic application-specific instruction-set extensions under microarchitectural constraints. In: Proceedings of the 40th Annual Design Automation Conference, pp. 256–261. ACM (2003)
Baghdadi, R., et al.: PENCIL: a platform-neutral compute intermediate language for accelerator programming. In: Parallel Architectures and Compilation Techniques (PACT), San Francisco, CA, USA (2015). https://hal.archives-ouvertes.fr/hal-01257236
Bastoul, C., et al.: System, methods and apparatus for program optimization for multi-threaded processor architectures, April 2010
Bergstra, J., et al.: Theano: a CPU and GPU math compiler in Python (2011)
Bondhugula, U., Baskaran, M., Krishnamoorthy, S., Ramanujam, J., Rountev, A., Sadayappan, P.: Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model. In: Hendren, L. (ed.) CC 2008. LNCS, vol. 4959, pp. 132–146. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-78791-4_9
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. CoRR abs/1512.01274 (2015). http://arxiv.org/abs/1512.01274
Clauss, P., Meister, B.: Automatic memory layout transformation to optimize spatial locality in parameterized loop nests. ACM SIGARCH Comput. Archit. News 28(1), 11–19 (2000)
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop No. EPFL-CONF-192376 (2011)
Dayarathna, M., Suzumura, T.: Automatic optimization of stream programs via source program operator graph transformations. Distrib. Parallel Databases 31(4), 543–599 (2013). https://doi.org/10.1007/s10619-013-7130-x
Feautrier, P.: Some efficient solutions to the affine scheduling problem. Part I. One-dimensional time. Int. J. Parallel Program. 21(5), 313–348 (1992). citeseer.ist.psu.edu/feautrier92some.html
Grosser, T., Groesslinger, A., Lengauer, C.: Polly-performing polyhedral optimizations on a low-level intermediate representation. Parallel Process. Lett. 22(04), 1250010 (2012)
Irigoin, F., Triolet, R.: Supernode partitioning. In: Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of programming languages, pp. 319–329. ACM Press, New York, January 1988
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding (2014)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lethin, R.A., Leung, A.K., Meister, B.J., Vasilache, N.T.: Methods and apparatus for joint parallelism and locality optimization in source code compilation, September 2009
Meister, B., Baskaran, M.M., Pradelle, B., Henretty, T., Lethin, R.: Efficient compilation to event-driven task programs. CoRR abs/1601.05458 (2016). http://arxiv.org/abs/1601.05458
Meister, B., Vasilache, N., Wohlford, D., Baskaran, M.M., Leung, A., Lethin, R.: R-stream compiler. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1756–1765. Springer, Boston (2011). https://doi.org/10.1007/978-0-387-09766-4
Meister, B., et al.: SANE: an array language for sensor applications. In: Proceedings of a Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, Salt Lake City, UT, USA, November 16 2012
Pop, S., et al.: GRAPHITE: loop optimizations based on the polyhedral model for GCC. In: Proceedings of the 4th GCC Developper’s Summit, pp. 179–198. Ottawa, Canada, June 2006
Pradelle, B., Meister, B., Baskaran, M.M., Konstantinidis, A., Henretty, T., Lethin, R.: Scalable hierarchical polyhedral compilation. In: International Conference on Parallel Processing (ICPP) (2016)
Vasilache, N., et al.: A tale of three runtimes. arXiv:1409.1914
Verdoolaege, S.: isl: an integer set library for the polyhedral model. In: Proceedings of the Third international congress conference on Mathematical software (ICMS 2010), pp. 299–302. ACM Press (2010)
Verdoolaege, S., Janssens, G.: Scheduling for PPCG. Technical report, Department of Computer Science, KU Leuven (2017)
Verdoolaege, S., Juega, J.C., Cohen, A., Gómez, J.I., Tenllado, C., Catthoor, F.: Polyhedral parallel code generation for CUDA. ACM Trans. Archit. Code Optim. 9(4), 54:1–54:23 (2013)
Yu, D., Yao, K., Zhang, Y.: The computational network toolkit [best of the web]. IEEE Signal Process. Mag. 32(6), 123–126 (2015)
Acknowledgment
This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressly or implied, of the Defense Advanced Research Projects Agency or the U.S. Government. This document was cleared by DARPA on August 23, 2017. Distribution Statement “A” (Approved for Public Release, Distribution Unlimited).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Pradelle, B., Meister, B., Baskaran, M., Springer, J., Lethin, R. (2019). Polyhedral Optimization of TensorFlow Computation Graphs. In: Bhatele, A., Boehme, D., Levine, J., Malony, A., Schulz, M. (eds) Programming and Performance Visualization Tools. ESPT ESPT VPA VPA 2017 2018 2017 2018. Lecture Notes in Computer Science(), vol 11027. Springer, Cham. https://doi.org/10.1007/978-3-030-17872-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-17872-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17871-0
Online ISBN: 978-3-030-17872-7
eBook Packages: Computer ScienceComputer Science (R0)