A Declarative Pipeline Language for Complex Data Analysis

  • Henning Christiansen
  • Christian Theil Have
  • Ole Torp Lassen
  • Matthieu Petit
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7844)

Abstract

We introduce BANpipe – a logic-based scripting language designed to model complex compositions of time consuming analyses. Its declarative semantics is described together with alternative operational semantics facilitating goal directed execution, parallel execution, change propagation and type checking. A portable implementation is provided, which supports expressing complex pipelines that may integrate different Prolog systems and provide automatic management of files.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Apache ant, http://ant.apache.org/ (accessed November 30, 2012)
  2. 2.
    Lomsadze, A., Besemer, J., Borodovsky, M.: Genemarks: a self-training method for predicition of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 29, 2607–2618 (2001)CrossRefGoogle Scholar
  3. 3.
    Chaiken, R., Jenkins, B., Larson, P.Å., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: easy and efficient parallel processing of massive data sets. Proceedings of the VLDB Endowment 1(2), 1265–1276 (2008)Google Scholar
  4. 4.
    Christiansen, H., Have, C.T., Lassen, O.T., Petit, M.: Bayesian Annotation Networks for Complex Sequence Analysis. In: Technical Communications of the 27th International Conference on Logic Programming, ICLP 2011. Leibniz International Proceedings in Informatics (LIPIcs), vol. 11, pp. 220–230. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik (2011)Google Scholar
  5. 5.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  6. 6.
    Durham, A.M., Kashiwabara, A.Y., Matsunaga, F.T.G., Ahagon, P.H., Rainone, F., Varuzza, L., Gruber, A.: Egene: a configurable pipeline generation system for automated sequence analysis. Bioinformatics 21(12), 2812–2813 (2005)CrossRefGoogle Scholar
  7. 7.
    Feldman, S.I.: Make – A program for maintaining computer programs. Software – Practice and Experience 9(3), 255–265 (1979)MATHCrossRefGoogle Scholar
  8. 8.
    Hoon, S., Ratnapu, K.K., Chia, J.M., Kumarasamy, B., Juguang, X., Clamp, M., Stabenau, A., Potter, S., Clarke, L., Stupka, E.: Biopipe: A flexible framework for protocol-based bioinformatics analysis. Genome Research, 1904–1915 (2003)Google Scholar
  9. 9.
    Jørgensen, N.: Safeness of make-based incremental recompilation. In: Eriksson, L.-H., Lindsay, P.A. (eds.) FME 2002. LNCS, vol. 2391, pp. 126–145. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Knight, S.: Building software with scons. Computing in Science & Engineering 7(1), 79–88 (2005)CrossRefGoogle Scholar
  11. 11.
    Lassen, O.T.: Compositionality in probabilistic logic modelling for biological sequence analysis. PhD thesis, Roskilde University (2011)Google Scholar
  12. 12.
    Moura, P.: Logtalk - Design of an Object-Oriented Logic Programming Language. PhD thesis, Department of Computer Science, University of Beira Interior, Portugal (September 2003)Google Scholar
  13. 13.
    Moura, P.: Programming patterns for logtalk parametric objects. In: Abreu, S., Seipel, D. (eds.) INAP 2009. LNCS (LNAI), vol. 6547, pp. 52–69. Springer, Heidelberg (2011)Google Scholar
  14. 14.
    Moura, P., Crocker, P., Nunes, P.: Multi-threading programming in Logtalk. In: Abreu, S., Costa, V.S. (eds.) Proceedings of the 7th Colloquium on Implementation of Constraint LOgic Programming Systems, pp. 87–101. University of Oporto, Oporto (2007)Google Scholar
  15. 15.
    Mungall, C.: Make-like build system based on prolog, https://github.com/cmungall/plmake (accessed November 30, 2012)
  16. 16.
    Mungall, C.: Skam - skolem assisted makefiles, http://skam.sourceforge.net/ (accessed November 30, 2012)
  17. 17.
    Noble, W.S.: A quick guide to organizing computational biology projects. PLoS Comput. Biol. 5(7), e1000424 (2009)Google Scholar
  18. 18.
    Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1099–1110. ACM (2008)Google Scholar
  19. 19.
    Sato, T., Kameya, Y.: Prism: a language for symbolic-statistical modeling. In: International Joint Conference on Artificial Intelligence, vol. 15, pp. 1330–1339 (1997)Google Scholar
  20. 20.
    Shapiro, E.: The family of concurrent logic programming languages. ACM Computing Surveys 21(3), 412 (1989)CrossRefGoogle Scholar
  21. 21.
    Stewart, A.C., Osborne, B., Read, T.D.: Diya: a bacterial annotation pipeline for any genomics lab. Bioinformatics 25, 962–963 (2009)CrossRefGoogle Scholar
  22. 22.
    Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proceedings of the VLDB Endowment 2(2), 1626–1629 (2009)Google Scholar
  23. 23.
    Ueda, K.: Guarded horn clauses. Technical Report TR-103, ICOT, Tokyo (1985)Google Scholar
  24. 24.
    Weirich, J.: Rake – ruby make, http://rake.rubyforge.org/ (accessed November 30, 2012)

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Henning Christiansen
    • 1
  • Christian Theil Have
    • 1
  • Ole Torp Lassen
    • 1
  • Matthieu Petit
    • 1
  1. 1.Research group PLIS: Programming, Logic and Intelligent Systems, Department of Communication, Business and Information TechnologiesRoskilde UniversityRoskildeDenmark

Personalised recommendations