Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

European Conference on Parallel Processing

Euro-Par 2012: Euro-Par 2012 Parallel Processing pp 102–114Cite as

  1. Home
  2. Euro-Par 2012 Parallel Processing
  3. Conference paper
CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores

CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores

  • Shuai Jiao19,20,
  • Paolo Ienne21,
  • Xiaochun Ye19,
  • Da Wang19,
  • Dongrui Fan19 &
  • …
  • Ninghui Sun19 
  • Conference paper
  • 2969 Accesses

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7484)

Abstract

This paper addresses the workload partition strategies in the simulation of manycore architectures. The key observation behind this paper is that, compared to traditional multicores, manycores feature more non-uniform memory access and unpredictable network traffic; these features degrades simulation speed and accuracy of Parallel Discrete Event Simulators (PDES) when one uses static workload partition schemes. Based on the observation, we propose an adaptive workload partition method: Core/Router-Adaptive Workload Partition (CRAW/P). The method delivers more speedup and accuracy than static partition schemes by partitioning the simulation of on-chip-network independently from that of the cores and by synchronizing them differently. Using a PDES simulator, we evaluate the performance of CRAW/P in simulating a 256-core general purpose many-core processor. Running SPLASH2 benchmark applications, the experimental results demonstrate it can deliver speed improvement by 28%~67% over static partition scheme and reduces timing errors to <10% in very relaxed simulation (quantum size as 64).

Keywords

  • Parallel Simulation
  • Manycore
  • Multicore
  • Workload Partition

Download conference paper PDF

References

  1. Howard, J., Dighe, S., Hoskote, Y., et al.: A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In: Proceedings of the International Solid-State Circuits Conference, ISSCC 2010 (February 2010)

    Google Scholar 

  2. Vangal, S., et al.: An 80-Tile 1.28 TFLOPS Network-on-Chip in 65nm CMOS. In: IEEE International Solid-State Circuits Conference, ISSCC 2007. Digest of Technical Papers, pp. 98–589 (2007)

    Google Scholar 

  3. Bell, S., et al.: TILE64 processor: A 64-core SoC with mesh interconnect. In: Proceedings of the International Solid-State Circuits Conference, ISSCC 2008 (February 2008)

    Google Scholar 

  4. The TILE-GxTM Processor Family, Tilera (2009), http://www.tilera.com/products/processors

  5. Fan, D., Zhang, H., Wang, D., et al.: High-Efficient Architecture of Godson-T Many-Core Processor. In: Proceedings of Hot Chips 23. IEEE Computer Society (2011)

    Google Scholar 

  6. Kelm, J.H., Johnson, D.R., Johnson, M.R., et al.: Rigel: An Architecture and Scalable Programming. In: ISCA 2009 (2009)

    Google Scholar 

  7. Burger, D., Austin, T.: The SimpleScalar tool set, version 2.0. Technical Report TR-1342, University of Wisconsin-Madison Computer Sciences Department (June 1997)

    Google Scholar 

  8. Binkert, N.L., Dreslinski, R.G., Hsu, L.R., Lim, K.T., Saidi, A.G., Reinhardt, S.K.: The M5 Simulator: Modeling Networked Systems. IEEE Micro 26, 4 (2006)

    CrossRef  Google Scholar 

  9. Magnusson, P.S., et al.: Simics: A Full System Simulation Platform. IEEE Computer 35(2), 50–58 (2002)

    CrossRef  Google Scholar 

  10. Chidester, M.C., George, A.D.: Parallel simulation of chip-multiprocessor architectures. Proceedings of ACM Trans. Model. Comput. Simul., 176–200 (2002)

    Google Scholar 

  11. Steinman, J.S.: SPEEDES: A Multiple-Synchronization Environment for Parallel Discrete-Event Simulation. International Journal in Computer Simulation 2, 251–286 (1992)

    Google Scholar 

  12. Chandy, K.: Distributed Simulation: A Case Study in Design and Verification of Distributed Programs. IEEE Transactions on Software Engineering 5(5), 440–452 (1979)

    CrossRef  MathSciNet  MATH  Google Scholar 

  13. Mukherjee, S.S., Reinhardt, S.: Wisconsin Wind Tunnel II: A Fast, Portable Parallel Architecture Simulator. IEEE Concurrency 8(4), 12–20 (2000)

    CrossRef  Google Scholar 

  14. Chen, J., Annavaram, M., Dubois, M.: SlackSim: A Platform for Parallel Simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37(2), 20–29 (2009)

    CrossRef  Google Scholar 

  15. Miller, J.E.: Graphite: A distributed parallel simulator for multicores. In: HPCA 2010: The 16th IEEE International Symposium on High-Performance Computer Architecture (2010)

    Google Scholar 

  16. Wang, K., Zhang, Y., Wang, H., Shen, X.: Parallelization of IBM mambo system simulator in functional modes. Operating Systems Review, 71–76 (2008)

    Google Scholar 

  17. Chiou, D., Sunwoo, D.: FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle Accurate Simulators. In: MICRO 2007: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 249–261 (2007)

    Google Scholar 

  18. Chung, E.S., Papamichael, M.K., Nurvitadhi, E., Hoe, J.C., Mai, K., Falsa, B.: ProtoFlex: Towards Scalable, Full System Multiprocessor Simulations Using FPGAs. ACM Trans. Recongurable Technol. Syst. 2(2), 1–32 (2009)

    CrossRef  Google Scholar 

  19. Dave, N.: Implementing a functional/timing partitioned microprocessor simulator with an FPGA. In: 2nd Workshop on Architecture Research using FPGA Platforms, WARFP 2006 (February 2006)

    Google Scholar 

  20. Monchiero, M., Ahn, J.H., Falcon, A., Ortega, D., Faraboschi, P.: How to simulate 1000 cores. SIGARCH Comput. Archit. News 37(2), 10–19 (2009)

    CrossRef  Google Scholar 

  21. Dybdahl, H.: An Adaptive Shared/Private NUCA Cache Partioning Scheme for Chip Multiprocessors. In: Proc. of the Int. Symposium on High Performance Architecture (HPCA), pp. 2–12 (2007)

    Google Scholar 

  22. Huiwei, L., et al.: P-GAS: Parallelizing a Cycle-Accurate Event-Driven Many-Core Processor Simulator Using Parallel Discrete Event Simulation. In: 24th ACM/IEEE/SCS Workshop on Principle of Advanced and Distributed Simulation (PADS 2010), Atlanta, USA (June 2010)

    Google Scholar 

  23. Jefferson, D., Beckman, B., Wieland, F., Blume, L., Diloreto, M.: Time warp operating system. In: Proceedings of the 11th ACM Symposium on Operating System Principles, pp. 77–93 (1987)

    Google Scholar 

  24. Das, S.R., Fujimoto, R., Panesar, K.S., Allison, D., Hybinette, M.: GTW: a time warp system for shared memory multiprocessors. In: Winter Simulation Conference, pp. 1332–1339 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

  1. SKL Computer Architecture, ICT, CAS, Beijing, P.R. China

    Shuai Jiao, Xiaochun Ye, Da Wang, Dongrui Fan & Ninghui Sun

  2. Graduate University of Chinese Academy of Sciences, Beijing, P.R. China

    Shuai Jiao

  3. École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland

    Paolo Ienne

Authors
  1. Shuai Jiao
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Paolo Ienne
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Xiaochun Ye
    View author publications

    You can also search for this author in PubMed Google Scholar

  4. Da Wang
    View author publications

    You can also search for this author in PubMed Google Scholar

  5. Dongrui Fan
    View author publications

    You can also search for this author in PubMed Google Scholar

  6. Ninghui Sun
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece

    Christos Kaklamanis

  2. University of Patras, University Building B, 26504, Rio, Greece

    Theodore Papatheodorou

  3. Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece

    Paul G. Spirakis

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jiao, S., Ienne, P., Ye, X., Wang, D., Fan, D., Sun, N. (2012). CRAW/P: A Workload Partition Method for the Efficient Parallel Simulation of Manycores. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_12

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-32820-6_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32819-0

  • Online ISBN: 978-3-642-32820-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature