Abstract
X10 is a promising recent parallel language designed specifically to address the challenges of productively programming a wide variety of target platforms. The sequential core of X10 is an object-oriented language in the Java family. This core is augmented by a few parallel constructs that create activities as a generalization of the well known fork/join model. Clocks are a generalization of the familiar barriers. Synchronization on a clock is specified by the Clock.advanceAll() method call. Activities that execute advances stall until all existent activities have done the same, and then are released at the same (logical) time.
This naturally raises the following question: are clocks strictly necessary for X10 programs? Surprisingly enough, the answer is no, at least for sufficiently regular programs. One assigns a date to each operation, denoting the number of advances that the activity has executed before the operation. Operations with the same date constitute a front, fronts are executed sequentially in order of increasing dates, while operations in a front are executed in parallel if possible. Depending on the nature of the program, this may entail some overhead, which can be reduced to zero for polyhedral programs. We show by experiments that, at least for the current X10 runtime, this transformation usually improves the performance of our benchmarks. Besides its theoretical interest, this transformation may be of interest for simplifying a compiler or runtime library.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Yelick, K., Semenzato, L., Pike, G., Miyamoto, C., Liblit, B., Krishnamurthy, A., Hilfinger, P., Graham, S., Gay, D., Colella, P., et al.: Titanium: A high-performance Java dialect. Concurrency Practice and Experience 10(11-13), 825–836 (1998)
Chamberlain, B., Callahan, D., Zima, H.: Parallel programmability and the Chapel language. International Journal of High Performance Computing Applications 21(3), 291–312 (2007)
Numrich, R.W., Reid, J.: Co-array Fortran for parallel programming. SIGPLAN Fortran Forum 17(2), 1–31 (1998)
UPC Consortium and others: UPC language specifications. Lawrence Berkeley National Lab. Tech. Report LBNL–59208 (2005)
Cavé, V., Zhao, J., Shirako, J., Sarkar, V.: Habanero-java: The new adventures of old X10. In: PPPJ 2011, pp. 51–61. ACM (2011)
Saraswat, V., Bloom, B., Peshansky, I., Tardieu, O., Grove, D.: X10 language specification version 2.2 (March 2012), http://x10.sourceforge.net/documentation/languagespec/x10-latest.pdf
Feautrier, P., Lengauer, C.: The polyhedral model. In: Padua, D. (ed.) Encyclopedia of Parallel Programming. Springer (2011)
Yuki, T., Feautrier, P., Rajopadhye, S., Saraswat, V.: Array dataflow analysis for polyhedral X10 programs. In: PPoPP (2013)
Verdoolaege, S., Seghir, R., Beyls, K., Loechner, V., Bruynooghe, M.: Counting integer points in parametric polytopes using Barvinok’s rational functions. In: Algorithmica (2007)
Lee, J., Padua, D.A., Midkiff, S.P.: Basic compiler algorithms for parallel programs. In: PPoPP 1999, pp. 1–12. ACM (1999)
Clauss, P.: Counting solutions to linear and nonlinear constraints through Ehrhart polynomials: Applications to analyze and transform scientific programs. In: ICS 1996, pp. 278–285. ACM (1996)
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: PACT 2013, Juan-les-Pins, pp. 7–16 (2004)
Ancourt, C., Irigoin, F.: Scanning polyhedra with DO loops. In: Proc. Third SIGPLAN Symp. on Principles and Practice of Parallel Programming, pp. 39–50. ACM Press (April 1991)
Aiken, A., Gay, D.: Barrier inference. In: POPL 1998, pp. 342–354 (1998)
Kamil, A., Yelick, K.: Concurrency analysis for parallel programs with textually aligned barriers. In: Ayguadé, E., Baumgartner, G., Ramanujam, J., Sadayappan, P. (eds.) LCPC 2005. LNCS, vol. 4339, pp. 185–199. Springer, Heidelberg (2006)
Darte, A., Schreiber, R.: A linear-time algorithm for optimal barrier placement. In: PPoPP 2005, pp. 26–35. ACM (2005)
Vasudevan, N., Tardieu, O., Dolby, J., Edwards, S.A.: Compile-time analysis and specialization of clocks in concurrent programs. In: de Moor, O., Schwartzbach, M.I. (eds.) CC 2009. LNCS, vol. 5501, pp. 48–62. Springer, Heidelberg (2009)
Tseng, C.W.: Compiler optimizations for eliminating barrier synchronization. In: PPoPP 1995, pp. 144–155. ACM (1995)
Padua, D.A., Paek, Y.: Compiling for scalable multiprocessors with Polaris. Parallel Processing Letters 07(04), 425–436 (1997)
Zhao, J., Shirako, J., Nandivada, V.K., Sarkar, V.: Reducing task creation and termination overhead in explicitly parallel programs. In: PACT 2010, pp. 169–180. ACM (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Feautrier, P., Violard, É., Ketterlin, A. (2014). Improving the Performance of X10 Programs by Clock Removal. In: Cohen, A. (eds) Compiler Construction. CC 2014. Lecture Notes in Computer Science, vol 8409. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54807-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-54807-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54806-2
Online ISBN: 978-3-642-54807-9
eBook Packages: Computer ScienceComputer Science (R0)