Automatic Pipeline Construction for Real-Time Annotation

  • Henning Wachsmuth
  • Mirko Rose
  • Gregor Engels
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7816)


Many annotation tasks in computational linguistics are tackled with manually constructed pipelines of algorithms. In real-time tasks where information needs are stated and addressed ad-hoc, however, manual construction is infeasible. This paper presents an artificial intelligence approach to automatically construct annotation pipelines for given information needs and quality prioritizations. Based on an abstract ontological model, we use partial order planning to select a pipeline’s algorithms and informed search to obtain an efficient pipeline schedule. We realized the approach as an expert system on top of Apache UIMA, which offers evidence that pipelines can be constructed ad-hoc in near-zero time.


Expert System Information Type Input Text Annotation Pipeline Annotation Type 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agichtein, E.: Scaling Information Extraction to Large Document Collections. Bulletin of the IEEE Computer Society TCDE 28, 3–10 (2005)Google Scholar
  2. 2.
  3. 3.
    Bangalore, S.: Thinking Outside the Box for Natural Language Processing. In: Gelbukh, A. (ed.) CICLing 2012, Part I. LNCS, vol. 7181, pp. 1–16. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  4. 4.
    Chiticariu, L., Krishnamurthy, R., Li, Y., Raghavan, S., Reiss, F.R., Vaithyanathan, S.: SystemT: An Algebraic Approach to Declarative Information Extraction. In: Proc. of the 48th ACL, pp. 128–137 (2010)Google Scholar
  5. 5.
    Das Sarma, A., Jain, A., Bohannon, P.: Building a Generic Debugger for Information Extraction Pipelines. In: Proc. of the 20th CIKM, pp. 2229–2232 (2011)Google Scholar
  6. 6.
    Dezsényi, C., Dobrowiecki, T.P., Mészáros, T.: Adaptive Document Analysis with Planning. In: Pěchouček, M., Petta, P., Varga, L.Z. (eds.) CEEMAS 2005. LNCS (LNAI), vol. 3690, pp. 620–623. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  7. 7.
    Etzioni, O.: Search Needs a Shake-up. Nature 476, 25–26 (2011)CrossRefGoogle Scholar
  8. 8.
    Fader, A., Soderland, S., Etzioni, O.: Identifying Relations for Open Information Extraction. In: Proc. of the EMNLP, pp. 1535–1545 (2011)Google Scholar
  9. 9.
    Fox, M.S., Smith, S.F.: ISIS: A Knowledge-based System for Factory Scheduling. Expert Systems 1, 25–49 (1984)CrossRefGoogle Scholar
  10. 10.
  11. 11.
    Kano, Y.: Kachako: Towards a Data-centric Platform for Full Automation of Service Selection, Composition, Scalable Deployment and Evaluation. In: Proc. of the 19th IEEE ICWS, pp. 642–643 (2012)Google Scholar
  12. 12.
    Kano, Y., Dorado, R., McCrohon, L., Ananiadou, S., Tsujii, J.: U-Compare: An Integrated Language Resource Evaluation Platform Including a Comprehensive UIMA Resource Library. In: Proc. of the Seventh LREC, pp. 428–434 (2010)Google Scholar
  13. 13.
    Kim, J.D., Wang, Y., Takagi, T., Yonezawa, A.: Overview of Genia Event Task in BioNLP Shared Task 2011. In: BioNLP Shared Task Workshop, pp. 7–15 (2011)Google Scholar
  14. 14.
    Marler, R.T., Arora, J.S.: Survey of Multi-Objective Optimization Methods for Engineering. Structural and Multidisciplinary Optimization 26(6), 369–395 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  15. 15.
    Pasca, M.: Web-based Open-Domain Information Extraction. In: Proc. of the 20th CIKM, pp. 2605–2606 (2011)Google Scholar
  16. 16.
    Pauls, A., Klein, D.: k-best A* Parsing. In: Proc. of the Joint Conference of the 47th ACL and the 4th IJCNLP, pp. 958–966 (2009)Google Scholar
  17. 17.
    Riabov, A., Liu, Z.: Scalable Planning for Distributed Stream Processing Systems. In: Proc. of the 16th ICAPS, pp. 31–41 (2006)Google Scholar
  18. 18.
    Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 3rd edn. Prentice-Hall (2009)Google Scholar
  19. 19.
    Wachsmuth, H., Prettenhofer, P., Stein, B.: Efficient Statement Identification for Automatic Market Forecasting. In: Proc. of the 23rd COLING, pp. 1128–1136 (2010)Google Scholar
  20. 20.
    Wachsmuth, H., Stein, B.: Optimal Scheduling of Information Extraction Algorithms. In: Proc. of the 24th COLING: Posters, pp. 1281–1290 (2012)Google Scholar
  21. 21.
    Wachsmuth, H., Stein, B., Engels, G.: Constructing Efficient Information Extraction Pipelines. In: Proc. of the 20th CIKM, pp. 2237–2240 (2011)Google Scholar
  22. 22.
    Žáková, M., Křemen, P., Železný, F., Lavrač, N.: Automating Knowledge Discovery Workflow Composition through Ontology-based Planning. IEEE Transactions on Automation Science and Engineering 8(2), 253–264 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Henning Wachsmuth
    • 1
  • Mirko Rose
    • 1
  • Gregor Engels
    • 1
  1. 1.s-lab – Software Quality LabUniversität PaderbornPaderbornGermany

Personalised recommendations