Improving the Performance of X10 Programs by Clock Removal

Feautrier, Paul; Violard, Éric; Ketterlin, Alain

doi:10.1007/978-3-642-54807-9_7

Paul Feautrier¹⁷,
Éric Violard¹⁸ &
Alain Ketterlin¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 8409))

Included in the following conference series:

International Conference on Compiler Construction

1293 Accesses
1 Citations

Abstract

X10 is a promising recent parallel language designed specifically to address the challenges of productively programming a wide variety of target platforms. The sequential core of X10 is an object-oriented language in the Java family. This core is augmented by a few parallel constructs that create activities as a generalization of the well known fork/join model. Clocks are a generalization of the familiar barriers. Synchronization on a clock is specified by the Clock.advanceAll() method call. Activities that execute advances stall until all existent activities have done the same, and then are released at the same (logical) time.

This naturally raises the following question: are clocks strictly necessary for X10 programs? Surprisingly enough, the answer is no, at least for sufficiently regular programs. One assigns a date to each operation, denoting the number of advances that the activity has executed before the operation. Operations with the same date constitute a front, fronts are executed sequentially in order of increasing dates, while operations in a front are executed in parallel if possible. Depending on the nature of the program, this may entail some overhead, which can be reduced to zero for polyhedral programs. We show by experiments that, at least for the current X10 runtime, this transformation usually improves the performance of our benchmarks. Besides its theoretical interest, this transformation may be of interest for simplifying a compiler or runtime library.

Download to read the full chapter text

Chapter PDF

TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution

BFCA+: automatic synthesis of parallel code with TLS capabilities

Article 21 January 2016

Sergio Aldea, Diego R. Llanos & Arturo Gonzalez-Escribano

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Article 16 April 2016

Alcides Fonseca, Bruno Cabral, … Ivo Correia

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., et al.: Titanium: A high-performance Java dialect. Concurrency Practice and Experience 10(11-13), 825–836 (1998)
Article Google Scholar
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications 21(3), 291–312 (2007)
Article Google Scholar
Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)
Article Google Scholar
UPC Consortium and others: UPC language specifications. Lawrence Berkeley National Lab. Tech. Report LBNL–59208 (2005)
Google Scholar
Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-java: The new adventures of old X10. In: PPPJ 2011, pp. 51–61. ACM (2011)
Google Scholar
Saraswat, V., Bloom, B., Peshansky, I., Tardieu, O., Grove, D.: X10 language specification version 2.2 (March 2012), http://x10.sourceforge.net/documentation/languagespec/x10-latest.pdf
Feautrier, P., Lengauer, C.: The polyhedral model. In: Padua, D. (ed.) Encyclopedia of Parallel Programming. Springer (2011)
Google Scholar
Yuki, T., Feautrier, P., Rajopadhye, S., Saraswat, V.: Array dataflow analysis for polyhedral X10 programs. In: PPoPP (2013)
Google Scholar
Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., Bruynooghe, M.: Counting integer points in parametric polytopes using Barvinok’s rational functions. In: Algorithmica (2007)
Google Scholar
Lee, J., Padua, D.A., Midkiff, S.P.: Basic compiler algorithms for parallel programs. In: PPoPP 1999, pp. 1–12. ACM (1999)
Google Scholar
Clauss, P.: Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: Applications to analyze and transform scientific programs. In: ICS 1996, pp. 278–285. ACM (1996)
Google Scholar
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 2013, Juan-les-Pins, pp. 7–16 (2004)
Google Scholar
Ancourt, C., Irigoin, F.: Scanning polyhedra with DO loops. In: Proc. Third SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp. 39–50. ACM Press (April 1991)
Google Scholar
Aiken, A., Gay, D.: Barrier inference. In: POPL 1998, pp. 342–354 (1998)
Google Scholar
Kamil, A., Yelick, K.: Concurrency analysis for parallel programs with textually aligned barriers. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 185–199. Springer, Heidelberg (2006)
Chapter Google Scholar
Darte, A., Schreiber, R.: A linear-time algorithm for optimal barrier placement. In: PPoPP 2005, pp. 26–35. ACM (2005)
Google Scholar
Vasudevan, N., Tardieu, O., Dolby, J., Edwards, S.A.: Compile-time analysis and specialization of clocks in concurrent programs. In: de Moor, O., Schwartzbach, M.I. (eds.) CC 2009. LNCS, vol. 5501, pp. 48–62. Springer, Heidelberg (2009)
Chapter Google Scholar
Tseng, C.W.: Compiler optimizations for eliminating barrier synchronization. In: PPoPP 1995, pp. 144–155. ACM (1995)
Google Scholar
Padua, D.A., Paek, Y.: Compiling for scalable multiprocessors with Polaris. Parallel Processing Letters 07(04), 425–436 (1997)
Article Google Scholar
Zhao, J., Shirako, J., Nandivada, V.K., Sarkar, V.: Reducing task creation and termination overhead in explicitly parallel programs. In: PACT 2010, pp. 169–180. ACM (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

INRIA, UCBL, CNRS & École Normale Supérieure de Lyon, LIP, Compsys, France
Paul Feautrier
INRIA & Université de Strasbourg, France
Éric Violard & Alain Ketterlin

Authors

Paul Feautrier
View author publications
You can also search for this author in PubMed Google Scholar
Éric Violard
View author publications
You can also search for this author in PubMed Google Scholar
Alain Ketterlin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’Informatique, INRIA and École Normale Supérieure, 45 rue d’Ulm, 75005, Paris, France
Albert Cohen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Feautrier, P., Violard, É., Ketterlin, A. (2014). Improving the Performance of X10 Programs by Clock Removal. In: Cohen, A. (eds) Compiler Construction. CC 2014. Lecture Notes in Computer Science, vol 8409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54807-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-54807-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54806-2
Online ISBN: 978-3-642-54807-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving the Performance of X10 Programs by Clock Removal

Abstract

Chapter PDF

Similar content being viewed by others

TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution

BFCA+: automatic synthesis of parallel code with TLS capabilities

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Improving the Performance of X10 Programs by Clock Removal

Abstract

Chapter PDF

Similar content being viewed by others

TurboBŁYSK: Scheduling for Improved Data-Driven Task Performance with Fast Dependency Resolution

BFCA+: automatic synthesis of parallel code with TLS capabilities

Automatic Parallelization: Executing Sequential Programs on a Task-Based Parallel Runtime

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation