Skip to main content

A Compiler Directed Approach to Hiding Configuration Latency in Chameleon Processors

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1896))

Abstract

The Chameleon CS2112 chip is the industry’s first reconfigurable communication processor. To attain high performance, the reconfiguration latency must be effectively tolerated in such a processor. In this paper, we present a compiler directed approach to hiding the configuration loading latency. We integrate multithreading, instruction scheduling, register allocation, and prefetching techniques to tolerate the configuration loading latency. Furthermore, loading configuration is overlapped with communication to further enhance performance. By running some kernel programs on a cycle-accurate simulator, we showed that the chip performance is significantly improved by leveraging such compiler and multithreading techniques.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. K. Bondalapati and V. K. Prasanna. ”mapping loops onto reconfigurable architectures”. In Proc. of Inter. Workshop on Field Programmable Logic and Applications, Sep. 1998.

    Google Scholar 

  2. M. Budiu and S. C. Goldstein. ”fast compilation for pipelined reconfigurable fabric”. In Proc. of ACM/SIGDA Inter. Symposium on FPGA, 1999.

    Google Scholar 

  3. T. J. Callahan and F. John Wawrzynek. ”instruction level parallelism for recongurable computing”. In Hartenstein and Keevallik, editors, Inter. Workshop on Field-Programmable Logic and Applications. Lecture Notes in Computer Science, LNCS 1482, Springer-Verlag, Aug. 1998.

    Chapter  Google Scholar 

  4. Chameleon Systems, Inc. http://www.chameleonsystems.com/, 2000.

  5. M. Gokhale and J. Stone. ”NAPA C: Compiling for a hybrid risc/fpga architcture”. In Proc. of the IEEE Symposium on FCCM, Apr. 1998.

    Google Scholar 

  6. S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer. PipeRench: A coprocessor streaming multimedia acceleration. In Proc. of ISCA-26, pages 28–39, Atlanta, Geor., May 1999.

    Google Scholar 

  7. S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao. ”the chimaera reconfigurable functional unit”. In Proc. of the IEEE Symposium on FCCM, Apr. 1997.

    Google Scholar 

  8. S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao. ”” configuration prefetch for single context reconfigurable coprocessors””. In Proc. of ACM/SIGDA Inter. Symposium on FPGA, Feb. 1998.

    Google Scholar 

  9. S. Hauck, Z. Li, and E. J. Schwabe. ”configuration compression for the xilinx xc6200 fpga”. In Proc. of the IEEE Symposium on FCCM, Apr. 1998.

    Google Scholar 

  10. J. R. Hauser and J. Wawrzynek. ”garp: A mips processor with a reconfigurable coprocessor”. In Proc. of the IEEE Symposium on FCCM, Apr. 1997.

    Google Scholar 

  11. S. S. Muchnick. Advanced Compiler Design and Implementation. Morgan Kaufmann Publishers, 1997.

    Google Scholar 

  12. R. Razdan. PRISC: Programmable Reduced Instruction Set Computers. PhD thesis, Harvard University, Division of Applied Sciences, Boston, 1994.

    Google Scholar 

  13. C. R. Rupp, M. Landguth, T. Garverick, E. Gomersall, H. Holt, J. M. Arnold, and M. Gokhale. ”the napa adaptive processing architecture”. In Proc. of the IEEE Symposium on FCCM, Apr. 1998.

    Google Scholar 

  14. S.K. Rajamani and P. Viswanath. ”a quantitative analysis of the processorprogrammable logic interface”. In Proc. of the IEEE Symposium on FCCM, Apr. 1997.

    Google Scholar 

  15. X. Tang and G. R. Gao. Automatically partitioning threads for multithreaded architectures. Journal of Parallel and Distributed Computing, 58(2):159–189, Aug. 1999.

    Google Scholar 

  16. M. Weinhardt and W. Luk. ”pipeline vectorization for reconfigurable systems”. In Proc. of the IEEE Symposium on FCCM, Apr. 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tang, X., Aalsma, M., Jou, R. (2000). A Compiler Directed Approach to Hiding Configuration Latency in Chameleon Processors. In: Hartenstein, R.W., Grünbacher, H. (eds) Field-Programmable Logic and Applications: The Roadmap to Reconfigurable Computing. FPL 2000. Lecture Notes in Computer Science, vol 1896. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44614-1_4

Download citation

  • DOI: https://doi.org/10.1007/3-540-44614-1_4

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-67899-1

  • Online ISBN: 978-3-540-44614-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics