Workflows and Components of Bioimage Analysis

Definitions of three types of bioimage analysis software—Component, Collection, and Workflow—are introduced in this chapter. The aim is to promote the structured designing of bioimage analysis methods, and to improve related learning and teaching.


Introduction
Software tools used for bioimage analysis tend to be seen as utilities that solve problems off-the-shelf. The extreme version of such is like: "If I know where to click, I can get good results!". In case of gaming software, as the user gets more used to the software, the user can achieve the final stage faster. To some extent, this might be true also with bioimage analysis software, but there is a big difference. As bioimage analysis is a part of scientific research, the goal to achieve is not to clear the common final stage that everyone heads toward, but something original that others have not found out. The difficulty of the usage of bioimage analysis software does not only reside in the hidden commands, but also in the fact that the user needs to come up with more-or-less original analysis. Then, how can we do something original using tools that are provided in public?
In this short chapter, we define several terms describing the world of bioimage analysis software, which are "workflows", "components", and "collections", and explain their relationships. We believe that clarifying the definition of these terms can contribute largely to those who want to learn bioimage analysis, as well as to those who need to design the teaching of bioimage analysis. The reason is that these terms link the generality of software packages provided in public, with the specificity and the originality of the analysis that one needs to achieve.

Types of Bioimage Analysis Software
Software packages such as ImageJ (Schneider et al. 2012), 1 MATLAB, 2 CellProfiler (Carpenter et al. 2006) 3 or ICY (de Chaumont et al. 2012 are often used to analyze image data in life sciences. These software packages are "collections" of implementation of image processing and analysis algorithms. Libraries such as ImgLib2 (Pietzsch et al. 2012), 5 OpenCV (Bradski 2000), 6 ITK (Johnson et al. 2015a,b), 7 VTK (Schroeder et al. 2006), 8 andScikit-Image (van der Walt et al. 2014) 9 are also packages of image processing and analysis algorithms, although with a different type of user interface that is not graphical. We invariably refer to them as "collections". To scientifically analyze and address an underlying biological problem, one needs to hand-pick some algorithms from these 1 collections, carefully adjust their functional parameters to the problem and assemble them in a meaningful order. Such a sequence of image processing algorithms with a specified parameter set is what we call a "workflow". The implementations of the algorithms that are used in the workflows are the "components" constituting that workflow (or "workflow components"). From the point of view of the expert who needs to assemble a workflow, a collection is a package bundling many different components. As an example, many plugins offered for ImageJ are mostly also collections (e.g. Trackmate (Tinevez et al. 2016), 10 3D Suite (Ollion et al. 2013), 11 MosaicSuite 12 …), as they bundle multiple components. On the other hand, some plugins, such as Linear Kuwahara filter plugin, 13 are a single component implemented as a single plugin.
Each workflow is uniquely associated with a specific biological research project because the question asked therein as well as the acquired image quality are often unique. This calls for a unique combination of components and parameter set. Some collections, especially those designed with GUI, offer workflow templates. These templates are preassembled sequences of image processing tasks to solve a typical bioimage analysis problem; all one needs to do is to adjust the parameters of each step. For example, in the case of Trackmate plugin for ImageJ (Tinevez et al. 2016), a GUI wizard guides the user to choose an algorithm for each step among several candidates and also to adjust their parameters to achieve a successful particle tracking workflow (see 7 Chap. 4). When these algorithms and parameters are set, the workflow is built. CellProfiler also has a helpful GUI that assists the user in building a workflow based on workflow templates (Carpenter et al. 2006). It allows the user to easily swap the algorithms for each step and test various parameter combinations. . Figure 1.1 summarizes the above explanations.
Though such templates are available for some typical tasks, collections generally do not provide helpful clues to construct a workflow-choice of components to be used and approach taken to assemble those components depend on expert knowledge, empirical knowledge or testing. Since the biological questions are so diverse, the workflow often needs to be original and might not match any available workflow templates. Building a workflow from scratch needs some solid knowledge about the components and the ways to combine them. It also requires an understanding of the biological problem itself. Each workflow is in essence associated with a specific biological question, and this question together with the image acquisition setup affect the required precision of the analysis. For example, image data in general should not be analyzed at a precision higher than the physical resolution of the imaging system that captures those data. 14 In some cases, a higher precision does not imply more meaningful results just because such precision can be irrelevant to the biological question. These aspects should be carefully considered during the planning of the analysis and the choice of the components, together with the choice of statistical treatment.
Many biologists feel difficulty in analyzing image data, because of the lack in skills and knowledge to close the gap between a collection of components and a practical 1 workflow. A collection bundles components without workflows, but it is often erroneously assumed that installing a collection is enough for solving bioimage analysis problem. The truth is that expert knowledge is required to choose components, adjust their parameters and build a workflow (. Fig. 1.1 red arrows). The correct assembly of components as an executable script is in general even more difficult, as it requires some programming skills. The use of components directly from library-type of collections, which host many useful components, also requires programming skills to access their API. Bioimage analysts may fill this gap but even they, who professionally analyze image data, need to always search for the most suitable components to solve problems, reaching the required accuracy or coping with huge data in a practical time.
Another important aspect and difficulty is the reproducibility of workflows. We often want to know how other people have performed image analysis and to learn from others new bioimage analysis strategies. In such cases, we look for workflows addressing a similar biological problem. However, many articles do not document the workflows they used in sufficient details to enable the reproducibility of the results. As an extreme example, we found articles with their image analysis description in Materials and Methods merely documenting that ImageJ was used for the image analysis. Such a minimalism should be strictly avoided. On the other hand, some workflows are written as a detailed text description in Materials and Methods sections in the publications. We go even further and recommend to publish workflows as executable scripts, i.e. a computer program, with documented parameter sets for clarity and reproducibility of analysis and results. In our opinion, the best format is a version-tracked script because the version

Workflow
. Fig. 1.1 Relationship between components, collection and workflow. Components (e.g. Gaussian blurring filter) are selected from collection (e.g. ImageJ) and assembled into a specific workflow (red arrow) for analyzing image data in each research project (e.g. scripts associated with journal papers) 1 used for the published results can be clearly stated and reused by others. A script embedded in a Docker image is even better for avoiding problems associated with a difference in execution environments. Towards a more efficient designing of workflows, The Network of European Bioimage Analysts (NEUBIAS) has been developing a searchable index named Bioimage Informatics Search Engine (BISE). This service is accessible online at 7 https://biii. eu and hosts the manually curated registry of collections, workflows and components.
Two ontologies are used for annotating resources registered to BISE: The BISE ontology for properties of resources e.g. programming language; and the EDAM Bioimaging Ontology (Kalaš et al. 2019)-an extension of the EDAM ontology (Ison et al. 2013) developed together with ELIXIR 15 -for applications of these resources, e.g. image processing step and imaging modality. "Component", "Workflow" and "Collections" are implemented as part of the BISE ontology for classifying the type of software, for more distinctive filtering of search results.
While BISE allows researchers to search for bioimage analysis resources at all these levels, general web search engines, such as Google, typically return hits of collections but not to the details of their components. In addition, workflows are in many cases hidden in biological papers and difficult to be discovered. BISE is also designed to feature users impressions on the usability of components and workflows so that individual experiences can be swiftly shared within the community.

Take Home Message
Within the world of bioimage analysis software, various types of tools, which can be classified as "collections", "components", or "workflows", coexist and are flatly provided to the public as "software tools". Clear definition of these types and recognition of the role of each is a foundation for learning and teaching bioimage analysis.
k Further Readings 1. Miura and Tosi (2016) discusses the general challenges of bioimage analysis. 2. Miura and Tosi (2017) provides more details on the structure and designing of bioimage analysis workflows. 3. Details about NEUBIAS can be found at the following web pages: 5 7 http://neubias. org 5 7 https://www. cost. eu/actions/CA15124: The Memorandum of Understanding describes the objectives of the network, that includes the motivation to create the registry 7 http://biii. eu.

1
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (7 http://creativecommons. org/licenses/by/4. 0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.