Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures

  • Shobana Padmanabhan
  • Phillip Jones
  • David V. Schuehler
  • Scott J. Friedman
  • Praveen Krishnamurthy
  • Huakai Zhang
  • Roger Chamberlain
  • Ron K. Cytron
  • Jason Fritts
  • John W. Lockwood


Applications for constrained embedded systems require careful attention to the match between the application and the support offered by an architecture, at the ISA and microarchitecture levels. Generic processors, such as ARM and Power PC, are inexpensive, but with respect to a given application, they often overprovision in areas that are unimportant for the application’s performance. Moreover, while application-specific, customized logic could dramatically improve the performance of an application, that approach is typically too expensive to justify its cost for most applications. In this paper, we describe our experience using reconfigurable architectures to develop an understanding of an application’s performance and to enhance its performance with respect to customized, constrained logic. We begin with a standard ISA currently in use for embedded systems. We modify its core to measure performance characteristics, obtaining a system that provides cycle-accurate timings and presents results in the style of gprof, but with absolutely no software overhead. We then provide cache-behavior statistics that are typically unavailable in a generic processor. In contrast with simulation, our approach executes the program at full speed and delivers statistics based on the actual behavior of the cache subsystem. Finally, in response to the performance profile developed on our platform, we evaluate various uses of the FPGA-realized instruction and data caches in terms of the application’s performance.


Reconfigurable architecture performance cycle-accurate hardware profiling 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Altschul, S.F., Gish, W., Miller, W., Myers, E.W.,  et al. 1990Basic Local Alignment Search ToolJournal of Molecular Biology21540310Google Scholar
  2. 2.
    AMBA Specification, (2003).Google Scholar
  3. 3.
    ARC International, Scholar
  4. 4.
    Marnix Arnold and Henk Corporaal. Designing Domain-specific Processors. Proceedings. of the 9th Intrenational Symposium on Hardware/Software Codesign, pp. 61–66, (April 2001).Google Scholar
  5. 5.
    Peter, M. March 1993Athanas and Harvey F. Silverman, Processor Reconfiguration Through Instruction-set MetamorphosisIEEE Computer261118Google Scholar
  6. 6.
    Todd, Austin, Eric, Larson, Dan, Ernst February 2002SimpleScalar: An Infrastructure for Computer System ModelingIEEE Computer355967Google Scholar
  7. 7.
    Amol Bakshi, Jingzhao Ou, and Viktor K. Prasanna, Towards Automatic Synthesis of a Class of Application-Specific Sensor Networks, Proceedings. of Int’l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 50–58 (2002).Google Scholar
  8. 8.
    C. Brandolese, W. Fornaciari, F. Salice, and D. Sciuto, Source-Level Execution Time Estimation of C Programs, Proceedings of the 9th Int’l Symposium on Hardware/Software Codesign, pp. 98–103, (April 2001).Google Scholar
  9. 9.
    Florian, Braun, John, Lockwood, Marcel, Waldvogel January 2002Protocol Wrappers for Layered Network Packet Processing in Reconfigurable HardwareIEEE Micro226674Google Scholar
  10. 10.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P. 2000A Portable Programming interface for Performance Evaluation on Modern processorsInt’l Journal of High Performance Computing Applications14189204Google Scholar
  11. 11.
    Callahan, T.J., Hauser, J.R., Wawrzynek, J. April 2000The Garp Architecture and C CompilerIEEE Computer336269Google Scholar
  12. 12.
    P. P. Chang, S. A. Mahlke, W. Y. Chen, N. J. Warter, and W. W. Hwu, IMPACT: An Architectural Framework for Multiple-Instruction-Issue Processors, Proceedings of the 18th Int’l Symposium on Computer Architecture (May 1991).Google Scholar
  13. 13.
    Hoon, Choi, Jong-Sun, Kim, Chi-Won, Yoon, In-Cheol, Park, Seung Ho, Hwang, Chong-Min, Kyung 1999Synthesis of application specific instructions for embedded DSP softwareIEEE Trans. on Comput.48603614JuneGoogle Scholar
  14. 14.
    T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, MIT (1990).Google Scholar
  15. 15.
    Sarang Dharmapurikar, Praveen Krishnamurthy, Todd Sproull, and John W. Lockwood. Deep Packet Inspection Using Parallel Bloom Filters. Hot Interconnects, pp. 44–51, CA: Stanford, (August 2003).Google Scholar
  16. 16.
    J. Dongarra, K. London, S. Moore, P. Mucci, D. Terpstra, H. You, and M. Zhou, Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters, Proceedings of Workshop on Parallel and Distributed Systems: Testing and Debugging (at IPDPS) (April 2003).Google Scholar
  17. 17.
    J. E. Carrillo Esparza and P. Chow, The Effect of Reconfigurable Units in Superscalar Processors, Proceding. ACM Int’l Symposium on Field Programmable Gate Arrays, pp. 141–150 (2001).Google Scholar
  18. 18.
    Dirk Fischer, Jürgen Teich, Michael Thies, and Ralph Weper, Efficient Architecture/Compiler Co-exploration For ASIPs, Proceedings, of Int’l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 27–34, (2002).Google Scholar
  19. 19.
    Scott Friedman, Nicholas Leidenfrost, Benjamin C. Brodie, and Ron K. Cytron, Hashtables for Embedded and Real-time Systems. Proceedings of the IEEE Workshop on Real-Time Embedded Systems, (2001).Google Scholar
  20. 20.
    Gaisler Research. Scholar
  21. 21.
    David Goodwin and Darin Petkov. Automatic Generation of Application Specific Processors, Proceedings of Int’l Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pp. 137–147 (2003).Google Scholar
  22. 22.
    Gschwind, M., Salapura, V., Maurer, D. April 2001FPGA Prototyping of a RISC Processor Core for Embedded ApplicationsIEEE Trans. on Very Large Scale Integration (VLSI) Systems9241250Google Scholar
  23. 23.
    Michael Gschwind, Instruction Set Selection for ASIP Design, Proceedings of the 7th Int’l Symposium on Hardware/Software Codesign, pp. 7–11, (May 1999).Google Scholar
  24. 24.
    T. Vinod Kumar Gupta, Roberto E. Ko, and Rajeev Barua, Compiler-Directed Customization of ASIP Cores, Proceedings of the 10th Int’l Sympasium on Hardware/Software Codesign, pp. 97–102, (May 2002).Google Scholar
  25. 25.
    S. Hauck, T. W. Fry, M. M. Hosler, and J. P. Kao, The Chimaera Reconfigurable Functional Unit. Proceedings of IEEE Symposium on FPGAs for Custom Computing Machines, pp. 87–96 (1997).Google Scholar
  26. 26.
    John R. Hauser and John Wawrzynek, Garp: A MIPS Processor with a Reconfigurable Coprocessor. Procedings of IEEE Sympasium on Field-Programmable Custom Computing Machines (April 1997).Google Scholar
  27. 27.
    Olivier Hebert and Yvon Savaria Ivan C. Kraljic, A Method to Derive Application-Specific Embedded Processing Cores, Proceedings of the 8th Int’l Symposium on Hardware/Software Codesign, pp. 88–92, (May 2000).Google Scholar
  28. 28.
    Edson L. Horta, John W. Lockwood, David E. Taylor, and David Parlour, Dynamic Hardware Plugins in an FPGA with Partial Run-time Reconfiguration, Design Automation Conference (DAC), New Orleans, LA (June 2002).Google Scholar
  29. 29.
    Phillip Jones, Shobana Padmanabhan, Daniel Rymarz, John Maschmeyer, David V. Schuehler, John W. Lockwood, and Ron K. Cytron, Liquid Architecture. Workshop on Next Generation Software (at IPDPS), (2004).Google Scholar
  30. 30.
    Paolo Ienne Kubilay Atasu, Laura Pozzi, Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints, Proceeding of Design Automation Conference (June 2003).Google Scholar
  31. 31.
    Mika, Kuulusa, Jari, Nurmi, Janne, Takala, Pasi, Ojala, Henrik, Herranen 1997A Flexible DSP core for Embedded SystemsIEEE Design and Test of Computers146068Google Scholar
  32. 32.
    LEON Specification. http://www. (2003).Google Scholar
  33. 33. Scholar
  34. 34.
    John W Lockwood, Evolvable Internet Hardware Platforms. The Third NASA/DoD Workshop on Evolvable Hardware (EH’2001), pp. 271–279 (July 2001).Google Scholar
  35. 35.
    John W. Lockwood, The Field-programmable Port Extender (FPX), (December 2003).Google Scholar
  36. 36.
    John W. Lockwood, Reconfigurable Network Group. (May 2004).Google Scholar
  37. 37.
    John W. Lockwood, James Moscola, Matthew Kulig, David Reddick, and Tim Brooks, Internet Worm and Virus Protection in Dynamically Reconfigurable Hardware, Military and Aerospace Programmable Logic Device (MAPLD), pp. E10, Washington DC, (September 2003).Google Scholar
  38. 38.
    Christian, Plessl, Rolf, Enzler, Herbert, Walder, Jan, Beutel, Marco, Platzner, Lothar, Thiele, Gerhard, Troester 2003The Case for Reconfigurable Hardware in Wearable ComputingPersonal and Ubiquitous Computing7299308Google Scholar
  39. 39.
    Joydeep Ray and James C. Hoe, High-level Modeling and FPGA Prototyping of Microprocessors. Proceedings ACM Int’l Symposium on Field Programmable Gate Arrays, pp. 100–107, (February 2003).Google Scholar
  40. 40.
    Mendel, Rosenblum, Edouard, Bugnion, Scott, Devine, Herrod, Stephen A. January 1997Using the SimOS Machine Simulator to Study Complex Computer SystemsACM Trans. on Modeling and Computer Simulation778103Google Scholar
  41. 41.
    C. R. Rupp, M. Landguth, T. Garverick, E. Gomersall, H. Holt, J. M. Arnold, and M. Gokhale, The NAPA Adaptive Processing Architecture, Proceedings IEEE Symposium on FPGAs for Custom Computing Machines, pp. 28–37, (1998).Google Scholar
  42. 42.
    Eric Schnarr and James R. Larus. Fast out-of-order Processor Simulation Using Memoization, Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 283–294. ACM Press (1998).Google Scholar
  43. 43.
    David V. Schuehler, James Moscola, and John W. Lockwood, Architecture for a Hardware Based, TCP/IP Content Scanning System, Hot Interconnects, pp. 89–94, Stanford, CA: (August 2003).Google Scholar
  44. 44.
    Barry Shackleford, Mitsuhiro Yasuda, Etsuko Okushi, Hisao Koizumi, Hiroyuki Tomiyama, and Hiroto Yasuura, Memory-CPU Size Optimization for Embedded System Designs, Proceedings of Design Automation Conference, pp. 246–251, (June 1997).Google Scholar
  45. 45.
    Lesley Shannon and Paul Chow, Using Reconfigurability to Achieve Real-time Profiling for Hardware/Software Codesign, Proceedings of ACM Int’l Symposium on Field Programmable Gate Arrays, pp. 190–199, (2004).Google Scholar
  46. 46.
    Timothy Sherwood, Erez Perelman, Greg Hamerly, and Brad Calder, Automatically Characterizing Large Scale Program Behavior. Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 45–57. ACM Press, (2002).Google Scholar
  47. 47.
    Singh, H., Ming-Hau, Lee, Guangrning, Lu, Kurdahi, F.J., Bagherzadeh, N., Chaves Filho, E.M. May 2000MorphoSys: An Integrated Reconfigurable System for Data-Parallel and Computation-Intensive ApplicationsIEEE Trans. on Computers49465481Google Scholar
  48. 48.
    Kyung soo Oh, Sang yong Yoon, and Soo-Ik Chae, Emulator Environment Based on an FPGA Prototyping Board. Proceedings of 11th IEEE Int’l Workshop on Rapid System Prototyping, pp. 72–77 (June 2000).Google Scholar
  49. 49.
    Brinkley Sprunt, Pentium 4 Performance-Monitoring Features, IEEE Micro, 22(4):72–82 (2002).Google Scholar
  50. 50.
    Stretch, Inc. Scholar
  51. 51.
    Kei Suzuki and Alberto Sangiovanni-Vincentelli, Efficient Software Performance Estimation Methods for Hardware/Software Codesign. Proceedings of Design Automation Conference, pp. 605–610, (June 1996).Google Scholar
  52. 52.
    Tensilica, Inc. Scholar

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • Shobana Padmanabhan
    • 1
  • Phillip Jones
    • 1
  • David V. Schuehler
    • 1
  • Scott J. Friedman
    • 1
  • Praveen Krishnamurthy
    • 1
  • Huakai Zhang
    • 1
  • Roger Chamberlain
    • 1
  • Ron K. Cytron
    • 1
  • Jason Fritts
    • 1
  • John W. Lockwood
    • 1
  1. 1.Department of Computer Science and EngineeringWashington UniversitySt. LouisUSA

Personalised recommendations