Task Partitioning for Multi-core Network Processors

  • Robert Ennals
  • Richard Sharp
  • Alan Mycroft
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3443)


Network processors (NPs) typically contain multiple concurrent processing cores. State-of-the-art programming techniques for NPs are invariably low-level, requiring programmers to partition code into concurrent tasks early in the design process. This results in programs that are hard to maintain and hard to port to alternative architectures. This paper presents a new approach in which a high-level program is separated from its partitioning into concurrent tasks. Designers write their programs in a high-level, domain-specific, architecturally-neutral language, but also provide a separate Architecture Mapping Script (AMS). An AMS specifies semantics-preserving transformations that are applied to the program to re-arrange it into a set of tasks appropriate for execution on a particular target architecture. We (i) describe three such transformations: pipeline introduction, pipeline elimination and queue multiplexing; and (ii) specify when each can be safely applied.

As a case study we describe an IP packet-forwarder and present an AMS script that partitions it into a form capable of running at 3Gb/s on an Intel IXP2400 Network Processor.


Global Variable Concurrent Task Concurrent Program Source Program Target Architecture 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Allen, J.R., Bass, B.M., Basso, C., Boivie, R.H., Calvignac, J.L., Davis, G.T., Frelechoux, L., Heddes, M., Herkesdorf, A., Kind, A., Logan, J.F., Peyravian, M., Sabhikhi, M.A.R.R.K., Siegel, M.S., Waldvogel, M.: PowerNP network processor: Hardware, software and applications. IBM Journal of research and development 47(2–3), 177–194 (2003)CrossRefGoogle Scholar
  2. 2.
    Barros, E., Sampaio, A.: Towards provably correct hardware/software partitioning using occam. In: Proceedings of the 3rd international workshop on Hardware/software co-design, pp. 210–217. IEEE Computer Society Press, Los Alamitos (1994)CrossRefGoogle Scholar
  3. 3.
    Burstall, R.M., Darlington, J.: A transformation system for developing recursive programs. In: JACM, vol. 24(1) (1977)Google Scholar
  4. 4.
    Ennals, R., Sharp, R., Mycroft, A.: Linear types for packet processing. In: Proceedings of the European Symposium on Programming, ESOP (2004)Google Scholar
  5. 5.
    Feather, M.: A system for assisting program transformation. ACM Transactions on Programming Languages and Systems 4(1), 1–20 (1982)zbMATHCrossRefGoogle Scholar
  6. 6.
    Freescale. C-5 Network Processor Architecture Guide (2001)Google Scholar
  7. 7.
    George, L., Blume, M.: Taming the IXP network processor. In: Proceedings of the ACM SIGPLAN 2003 conference on Programming Language Design and Implementation, pp. 26–37 (2003)Google Scholar
  8. 8.
    Hwang, C.-T., Hsu, Y.-C., Lin, Y.-L.: Scheduling for functional pipelining and loop winding. In: Proceedings of the 28th conference on ACM/IEEE design automation, pp. 764–769. ACM Press, New York (1991)CrossRefGoogle Scholar
  9. 9.
    Ikinci, M.: Multilevel heuristics for task assignment in distributed systems. Master’s thesis, Bilkent University, Turkey (1998)Google Scholar
  10. 10.
    Intel Corporation. Intel IXP2400 Network Processor: Flexible, high-performance solution for access and edge applications. Available from,
  11. 11.
    Intel Corporation. PacLang,
  12. 12.
    Intel Corporation. Microengine C Language Support Reference Manual (2003)Google Scholar
  13. 13.
    Lam, M.: Software pipelining: An effective scheduling technique for VLIW machines. In: Proceedings of the ACM SIGPLAN conference on Programming Language Design and Implementation, pp. 318–328 (1988)Google Scholar
  14. 14.
    Lam, M.: Compiler optimizations for asynchronous systolic array programs. In: Proceedings of the ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages (1998)Google Scholar
  15. 15.
    Lo, V.M.: Heuristic algorithms for task assignment in distributed systems. IEEE Transactions on Computers, 1384–1397 (1988)Google Scholar
  16. 16.
    Marinescu, M.-C.V., Rinard, M.: High-level automatic pipelining for sequential circuits. In: Proceedings of the 14th international symposium on Systems Synthesis, pp. 215–220. ACM Press, New York (2001)CrossRefGoogle Scholar
  17. 17.
    Mycroft, A., Sharp, R.: A statically allocated parallel functional language. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, p. 37. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  18. 18.
    Network Working Group. RFC1812: Requirements for IP version 4 routersGoogle Scholar
  19. 19.
    Papaefthymiou, M.C.: On retiming synchronous circuitry and mixed integer optimization. Master’s thesis, Massachusetts Institute of Technology (1990)Google Scholar
  20. 20.
    Radisys. ENP-2611 network processor board,
  21. 21.
    Teja. Teja NP: The first software platform for multiprocessor system-on-chip architectures,
  22. 22.
    Winskel, G.: The formal semantics of programming languages: an introduction. Foundations of computing. MIT Press, Cambridge (1993)zbMATHGoogle Scholar
  23. 23.
    Yavatkar, R., Vin, H.: IEEE Network Magazine. Special issue on Network Processors: Architecture, Tools, and Applications 17(4) (July 2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Robert Ennals
    • 1
  • Richard Sharp
    • 1
  • Alan Mycroft
    • 2
  1. 1.Intel Research CambridgeCambridgeUK
  2. 2.Computer LaboratoryCambridge UniversityCambridgeUK

Personalised recommendations