Advertisement

Distributed Stream Processing with DUP

  • Kai Christian Bader
  • Tilo Eißler
  • Nathan Evans
  • Chris GauthierDickey
  • Christian Grothoff
  • Krista Grothoff
  • Jeff Keene
  • Harald Meier
  • Craig Ritzdorf
  • Matthew J. Rutherford
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6289)

Abstract

This paper introduces the DUP System, a simple framework for parallel stream processing. The DUP System enables developers to compose applications from stages written in almost any programming language and to run distributed streaming applications across all POSIX-compatible platforms. Parallel applications written with the DUP System do not suffer from many of the problems that exist in traditional parallel languages. The DUP System includes a range of simple stages that serve as general-purpose building blocks for larger applications. This work describes the DUP Assembly language, the DUP architecture and some of the stages included in the DUP run-time library. We then present our experiences with parallelizing and distributing the ARB project, a package of tools for RNA/DNA sequence database handling and analysis.

Keywords

Coordination language parallel programming productivity 

References

  1. 1.
    Lindholm, E., Nickolls, J., Oberman, S., Montrym, J.: NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 39–55 (2008)CrossRefGoogle Scholar
  2. 2.
    Flachs, B., Asano, S., Dhong, S.H., Hofstee, P., Gervias, G., Kim, R., Le, T., Liu, P., Leenstra, J., Liberty, J., Michael, B., Oh, H., Mueller, S.M., Takahashi, O., Hatakeyama, A., Wantanbe, Y., Yano, N.: A stream processing unit for a cell processor. In: IEEE International Solid-State Circuits Conference, pp. 134–135 (2005)Google Scholar
  3. 3.
    Quigley, E.: UNIX Shells, 4th edn. Prentice Hall, Englewood Cliffs (2004)Google Scholar
  4. 4.
    Grothoff, C., Keene, J.: The DUP protocol specification v2.0. Technical report, The DUP Project (2010)Google Scholar
  5. 5.
    Hartmann, J.P.: CMS Pipelines Explained. IBM Denmark (2007), http://vm.marist.edu/~pipeline/
  6. 6.
    IBM: CMS Pipelines User’s Guide. version 5 release 2 edn. IBM Corp. (2005), http://publibz.boulder.ibm.com/epubs/pdf/hcsh1b10.pdf
  7. 7.
    Goebelbecker, E.: Using grep: Moving from DOS? Discover the power of this Linux utility. Linux Journal (1995)Google Scholar
  8. 8.
    Dougherty, D.: Sed and AWK. Reilly & Associates, Inc., Sebastopol (1991)Google Scholar
  9. 9.
    Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Mei, W., Hwu, W.W.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 73–82. ACM, New York (2008)Google Scholar
  10. 10.
    Nordberg, E.K.: YODA: selecting signature oligonucleotides. Bioinformatics 21, 1365–1370 (2005)CrossRefGoogle Scholar
  11. 11.
    Linhart, C., Shamir, R.: The degenerate primer design problem. Bioinformatics 18(Suppl. 1), S172–S181 (2002)CrossRefGoogle Scholar
  12. 12.
    Kaderali, L., Schliep, A.: Selecting signature oligonucleotides to identify organisms using DNA arrays. Bioinformatics 18, 1340–1349 (2002)CrossRefGoogle Scholar
  13. 13.
    Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Yadhukumar, Buchner, A., Lai, T., Steppi, S., Jobb, G., Förster, W., Brettske, I., Gerber, S., Ginhart, A.W., Gross, O., Grumann, S., Hermann, S., Jost, R., König, A., Liss, T., Lüssmann, R., May, M., Nonhoff, B., Reichel, B., Strehlow, R., Stamatakis, A., Stuckmann, N., Vilbig, A., Lenke, M., Ludwig, T., Bode, A., Schleifer, K.H.: ARB: a software environment for sequence data. Nucleic Acids Research 32, 1363–1371 (2004)CrossRefGoogle Scholar
  14. 14.
    Shendure, J., Ji, H.: Next-generation DNA sequencing. Nat. Biotechnol. 26, 1135–1145 (2008)CrossRefGoogle Scholar
  15. 15.
    Klug, T.: Hardware of the InfiniBand ClusterGoogle Scholar
  16. 16.
    Pruesse, E., Quast, C., Knittel, K., Fuchs, B.M., Ludwig, W., Peplies, J., Glöckner, F.O.: SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Research 35, 7188–7196 (2007)CrossRefGoogle Scholar
  17. 17.
    Loy, A., Maixner, F., Wagner, M., Horn, M.: probeBase – an online resource for rRNA-targeted oligonucleotide probes: new features 2007. Nucleic Acids Research 35 (2007)Google Scholar
  18. 18.
    Kahn, G.: The semantics of a simple language for parallel programming. Information Processing, 993–998 (1974)Google Scholar
  19. 19.
    Parks, T.M.: Bounded Scheduling of Process Networks. PhD thesis, University of California, Berkeley (1995)Google Scholar
  20. 20.
    Giacomoni, J., Moseley, T., Vachharajani, M.: Fastforward for efficient pipeline parallelism: a cache-optimized concurrent lock-free queue. In: PPoPP 2008: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 43–52. ACM, New York (2008)Google Scholar
  21. 21.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  22. 22.
    Spring, J.H., Privat, J., Guerraoui, R., Vitek, J.: Streamflex: high-throughput stream programming in java. SIGPLAN Not. 42, 211–228 (2007)CrossRefGoogle Scholar
  23. 23.
    Lee, E.A.: Ptolemy project (2008), http://ptolemy.eecs.berkeley.edu/
  24. 24.
    Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: PLDI 2008: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 114–124. ACM, New York (2008)CrossRefGoogle Scholar
  25. 25.
    Hirzel, M., Andrade, H., Gedik, B., Kumar, V., Losa, G., Soule, R., Wu, K.-L.: Spade language specification. Technical report, IBM Research (2009)Google Scholar
  26. 26.
    Amini, L., Andrade, H., Bhagwan, R., Eskesen, F., King, R., Selo, P., Park, Y., Venkatramani, C.: Spc: A distributed, scalable platform for data mining. In: Workshop on Data Mining Standards, Services and Platforms, DM-SPP (2006)Google Scholar
  27. 27.
    Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: Distributed data-parallel programs from sequential building blocks. In: European Conference on Computer Systems (EuroSys), Lisabon, Portugal, pp. 59–72 (2007)Google Scholar
  28. 28.
    Gelernter, D., Carriero, N.: Coordination languages and their significance. ACM Commun. 35, 97–107 (1992)CrossRefGoogle Scholar
  29. 29.
    Carriero, N., Gelernter, D.: Linda in context. ACM Commun. 32, 444–458 (1989)CrossRefGoogle Scholar
  30. 30.
    Wells, G.C.: A Programmable Matching Engine for Application Development in Linda. PhD thesis, University of Bristol (2001)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2010

Authors and Affiliations

  • Kai Christian Bader
    • 1
  • Tilo Eißler
    • 1
  • Nathan Evans
    • 1
  • Chris GauthierDickey
    • 2
  • Christian Grothoff
    • 1
  • Krista Grothoff
    • 1
  • Jeff Keene
    • 2
  • Harald Meier
    • 1
  • Craig Ritzdorf
    • 2
  • Matthew J. Rutherford
    • 2
  1. 1.Faculty of InformaticsTechnische Universität MünchenGermany
  2. 2.Department of Computer ScienceUniversity of DenverUSA

Personalised recommendations