Using registries to integrate bioinformatics tools and services into workbench environments
- 1k Downloads
The diversity and complexity of bioinformatics resources presents significant challenges to their localisation, deployment and use, creating a need for reliable systems that address these issues. Meanwhile, users demand increasingly usable and integrated ways to access and analyse data, especially within convenient, integrated “workbench” environments. Resource descriptions are the core element of registry and workbench systems, which are used to both help the user find and comprehend available software tools, data resources, and Web Services, and to localise, execute and combine them. The descriptions are, however, hard and expensive to create and maintain, because they are volatile and require an exhaustive knowledge of the described resource, its applicability to biological research, and the data model and syntax used to describe it. We present here the Workbench Integration Enabler, a software component that will ease the integration of bioinformatics resources in a workbench environment, using their description provided by the existing ELIXIR Tools and Data Services Registry.
KeywordsBioinformatics Service registry Service integration
Ongoing advances in bioinformatics have produced a vast and ever-increasing number of computational methods and biological databases, available in multiple forms, such as downloadable software or data and remote services for data analysis, query, and retrieval. The diversity and complexity presents significant challenges to resource description, localisation and deployment. Meanwhile, users demand increasingly convenient and usable ways to access and analyse data, especially within environments that integrate resources or handle workflow. We propose a novel approach to integration of existing resources in such environments, that reuses resource descriptions extracted from the ELIXIR Tools and Data Services Registry [27, 28], hereon referred to as “ELIXIR registry”.
Registries address the question of resource discovery, i.e. finding and understanding relevant resources, by collating resource descriptions into a searchable catalogue. Examples of registries within bioinformatics include the EMBRACE Web Service Registry , BioCatalogue , and AppDB . Other systems, such as BioMoby  and Soaplab [24, 25], were developed to enable both the description and the execution of services, using a Web-based interface. Such registries face significant challenges. Solutions based on Web-service technologies do not always scale to the large data volumes required for high-throughput omics analyses. Furthermore, centralised registry efforts have tended to deteriorate in the long-term, and are only fulfilling the discovery purpose, where they have not been coupled to environments for accessing the resources. ELIXIR , a European infrastructure for biological information, is constructing the Tools and Data Services Registry for bioinformatics resources from around the world. The registry is being built through collaboration with the key resources providers, upon a federated curation model which supports resource providers in the curation of their own resources. This model decentralises the curation burden and should not only lead to a registry that is more durable, but one that is of higher quality because it leverages the knowledge of the resource providers. In this article, we are outlining the vision of coupling the ELIXIR registry to workbench environments to avoid duplication of curation efforts and maximise utility for users.
While registries address the question of resource discovery, the usage of the tools often remains difficult, because their configuration may be complex and their execution rely on command-line or programmatic interfaces, which are not always transparent to the user. To enhance accessibility, usability, and combining tools, workbenches enable tools execution using graphical, often Web-based, user interfaces. Most of these systems (e.g. Mobyle [17, 18], Galaxy [2, 10], Bio-jETI [15, 16], GenePattern , UGENE , Geneious , and Taverna [14, 30]) rely on detailed tool descriptions in a plugin-based architecture that automatically generates the user interface, invokes the tool, and displays the results in a homogeneous environment. Additionally, workbenches use the tool descriptions for other essential functions, such as searching for and explaining tools, and workflow composition. Thus, there are significant functional and conceptual overlaps between registries and workbenches, which are not reflected in the existing, uncoupled registry and workbench implementations.
The registration or integration of resources, whether in registry or workbench systems, relies on resource descriptions. The structure of such documents is described in detail in Sect. 2. The format of such plugin documents is usually complex and highly specific to the target system. Furthermore, because of the inherent complexity of tools, the descriptions can be difficult to create and maintain, especially by a registry or workbench curator who is not necessarily as familiar with a tool as the person who developed it. This leads to multiple recurring problems that have been addressed in various ways, as described in Sect. 4. In Sect. 5, we describe the new and complementary approach to this problem that we are currently developing; the semi-automatic generation of workbench integration components from the descriptions of resources registered in the ELIXIR Tools and Data Services Registry. This Workbench Integration Enabler will be a software component that eases the integration of bioinformatics resources in workbench or workflow environment such as Mobyle, Galaxy and Taverna, using their description provided by the existing ELIXIR Tools and Data Services Registry.
2 Resource descriptions for registries and workbench systems
Registries and workbench systems both rely on a data model that enables the description of resources they integrate. However, because their functionalities are different, the information stored about the resources in both types of systems overlaps, but is not identical.
2.1 Resource descriptions for registries
find a resource by various means, for example, based on the operation that needs to be performed, its inputs and outputs, by the name of the resource or its author, by searching its description, or by the type of interface that is required
verify the relevance of a selected resource by reading its description, the publication it refers to, the available documentation, by comparing it to existing offerings, etc.
access the resource, which might for example be a Web service or a downloadable and installable package
cite the resource in a publication.
the name of the tool
a URL to access directly or to download the tool
a short and human-readable description
the list of its authors, and the list of the publications describing the tool
the descriptions of the specific operations that are implemented and the types of data they process and produce, in both human and machine-understandable forms.
2.2 Resource descriptions for workbenches
search, by enabling users to select relevant tools based on their classification or human-readable description fields
combining tools, by filtering only the ones that can be chained together either interactively or automatically for successive steps in an analysis, based on the compatibility of the types of data and formats the tools consume and produce
ancillary tasks like data format detection and conversion, by using external tools declared as format detection/conversion utilities.
3 Comparison of description attributes in the ELIXIR Tools and Data Services Registry and Mobyle workbench
the basic description of the tool is a broad description of what it does and contextual information, such as authors and publications, in both human and machine-interpretable terms. It is mostly used for search purposes.
the function describes how to interact with the given tool, by providing a more detailed description of its inputs and outputs, and the operations it can perform.
the implementation of the tool enables its automatic execution. In the case of Mobyle, this requires for a command-line tool, a description of how a user’s request is transformed into a command, and how its results are captured once the command has been executed.
Comparison of the data model of the ELIXIR and the Mobyle tool descriptions
ELIXIR tools and data
Tool-level invocation code
Topics (EDAM refs)
Functions (one or more functions performed by a given service)
Function name (EDAM ref)
Parameters (one parameter per input or output of each function of the service)
Parameter handle (EDAM ref)
Parameter-level Invocation code
4 Creation and maintenance of tool descriptions within a workbench
the evolution of a tool is not always captured in a timely manner. For instance, a new version of a software may have new input parameters, but this is often not reflected immediately upon deployment, especially if the new inputs are optional. Such a change can happen without notice, because it does not break the existing tool usage, but nevertheless induces a bias in the available interface.
the descriptions for workbench environments are not exhaustive with regard to all of the possible options available in the published software, because of bias of the initial intended usage. The time required to describe completely the software can be reduced, by modelling a minimal set of options which are immediately needed, but this limits the potential of the tool or even prevents some niche uses completely.
the tool descriptions tend to focus on the execution layer, that enables the execution of the tool but lacks peripheral information that is useful to achieve a greater degree of integration. This is an acute issue for finding tools, their usability (requiring documentation), composition (requiring parameter typing), provenance tracking (requiring a record of settings and their semantics), and attribution (requiring means of accreditation, citation, etc.). Given the importance of these aspects, especially for non-familiar users, this can reduce the utility of such interfaces greatly.
5 Usage of tool descriptions from ELIXIR Tools and Data Services Registry as templates for creation of tool wrappers for workbenches
collaboration between the ELIXIR registry and workbench maintainers—to maintain the information that is required for both the registry and workbenches in one place—will save time and effort, and lead to better tool descriptions and more durable registry and workbench environments. This is especially so, given that ELIXIR is supporting this vital “document once” principle and supporting resource providers in the curation of their own resources.
the ELIXIR registry uses the EDAM ontology  to provide a controlled vocabulary for the description of scientific topics, software operations, types of data and data formats. By propagating these annotations to the workbench, the end-user will benefit from rich and consistent tool descriptions in both environments. Further, EDAM development will leverage the user communities of both environments ensuring the vocabulary fulfills users needs.
updates in the registered tool descriptions can propagate from the ELIXIR registry—via a notification service—to integrators. This will inform integrators about changes and summarise the changes, so that they can be acted on in a timely manner.
general information such as the authors and references is emphasized in the ELIXIR registry but tends to be neglected by integrators in the creation of tool descriptions. It will be a valuable complement to the workbench, useful for both tool providers and users.
integration of tools with standardized interfaces, such as EMBOSS tools, can be completely automated by merging the technical information provided with the tools (such as the tools descriptions in EMBOSS), with the applicability and attribution information from the ELIXIR Tools and Data Service Registry.
We presented here a novel way to improve the integration of bioinformatics resources in workbench systems, by mapping and translating the resource metadata contained in the ELIXIR registry. This approach can significantly reduce the problems previously cited which hinder the generation and maintenance of resource descriptions, by improving their quality, comprehensiveness and update time. When implemented as a service, it will lower the cost to developers of integrating their resources in key workbench environments, and assist bioinformaticians to build, use and update well documented and reproducible workflows. It will therefore be a practical way to improve resource utility, including interoperability. As new, high priority tools and services are added to the ELIXIR registry, these can be offered as candidates for inclusion in the workbench instances. This will in turn inform and drive the curation of such resources in sufficient detail to support their integration and invocation. We also plan to capitalize on the use of the ELIXIR registry as a reference for both service providers and integrators to facilitate exchanges between these two groups of experts.
This work was partly funded by ELIXIR, the research infrastructure for life-science data. Hervé Ménager wishes to thank Bertrand Néron, Fabien Mareuil and Olivia Doppelt-Azeroual for their insights on the maintenance of wrappers for workbench systems.
- 1.Bhagat, J., Tanoh, F., Nzuobontane, E., Laurent, T., Orlowski, J., Roos, M., Wolstencroft, K., Aleksejevs, S., Stevens, R., Pettifer, S., et al.: Biocatalogue: a universal catalogue of web services for the life sciences. Nucleic Acids Res. 38, W689–W694 (2010). doi:10.1093/nar/gkq394
- 2.Blankenberg, D., Kuster, G.V., Coraor, N., Ananda, G., Lazarus, R., Mangan, M., Nekrutenko, A., Taylor, J.: Galaxy: a web-based genome analysis tool for experimentalists. Curr. Protoc. Mol. Biol. (2010). doi:10.1002/0471142727.mb1910s89
- 6.Cock PJA, Grüning BA, Paszkiewicz K, Pritchard L.: Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. Peer J 1:e167 (2013). doi:10.7717/peerj.167
- 9.EGI Application Database (AppDB). https://appdb.egi.eu. Accessed 21 July 2015
- 13.Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., Drummond, A.: Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28(12), 1647–1649 (2012). doi:10.1093/bioinformatics/bts199. http://bioinformatics.oxfordjournals.org/content/28/12/1647.abstract
- 15.Lamprecht, A.L., Naujokat, S., Margaria, T., Steffen, B.: Semantics-Based Composition of EMBOSS Services with Bio-jETI. In: Proceedings of the Workshop on Semantic Web Applications and Tools for Life Sciences (SWAT4LS). Amsterdam, The Netherlands, November 20, 2009 (2009)Google Scholar
- 16.Lamprecht, A.L., Naujokat, S., Margaria, T., Steffen, B.: Semantics-based composition of EMBOSS services. J. Biomed. Semant. 2(S-1), S5 (2011)Google Scholar
- 17.Ménager, H., Gopalan, V., Néron, B., Larroudé, S., Maupetit, J., Saladin, A., Tufféry, P., Huyen, Y., Caudron, B.: Bioinformatics applications discovery and composition with the Mobyle suite and Mobylenet. In: Resource Discovery, pp. 11–22. Springer, Berlin (2012)Google Scholar
- 24.Senger, M., Rice, P., Bleasby, A., Oinn, T., Uludag, M.: Soaplab2: more reliable sesame door to bioinformatics programs. In: Bioinformatics Open Source Conference, BOSC, vol. 8 (2008)Google Scholar
- 25.Senger, M., Rice, P., Oinn, T.: Soaplab-a unified sesame door to analysis tools. In: Proceedings of the UK e-Science All Hands Meeting, vol. 18, pp. 509–513. Citeseer (2003)Google Scholar
- 26.Tatum, Z., den Dunnen, J., Laros, J.F.: CLI-mate: an interface generator for command line programs. In: Proceedings of the 4th International Workshop on Semantic Web Applications and Tools for the Life Sciences, pp. 114–115. ACM (2011)Google Scholar
- 27.The Danish ELIXIR node. http://elixir-node.cbs.dtu.dk/. Accessed 21 July 2015
- 28.The ELIXIR Tools and Data Services Registry. http://elixir-registry.cbs.dtu.dk. Accessed 21 July 2015
- 30.Wolstencroft, K., Haines, R., Fellows, D., Williams, A., Withers, D., Owen, S., Soiland-Reyes, S., Dunlop, I., Nenadic, A., Fisher, P., et al.: The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res. 41, W557–W561 (2013). doi:10.1093/nar/gkt328
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.