Introduction

MycoBank was officially launched in 2004 as an online repository with the primary aim to register all fungal taxonomic novelties published (including new names and combinations), and make this available in an open access database to the mycological community (Crous et al. 2004). One of the major constraints experienced by mycologists was that many newly published fungal names were not accessible to researchers in developing countries or simply overlooked, because they were published in obscure sources. Due to the large number of names published each year in a range of publications, MycoBank curators were not always able to verify and include all of them in the database. To address this issue, we approached a large number of journal editors that published taxonomic novelties, and suggested that they request authors deposit nomenclatural data, descriptions and illustrations in MycoBank, as good practice. This equates MycoBank as a phenotypic equivalent of GenBank, the main database for genotypic data. Authors would receive a unique identifier to link the registration to the name (equivalent to a GenBank accession number for data sequence), and would simultaneously be assured that no homonyms were published, as the search engine would inform authors if the name was already occupied (https://doi.org/www.mycobank.org). Registration was seen as a two-step process; upon acceptance of the article, authors deposit their taxonomic novelties, provide the MB numbers in the protolog, and upon publication, notify MycoBank to ensure that the taxonomic novelty could be released to the community with date, volume and page numbers.

So popular was the system with mycologists, that proposals to make the deposit of the key elements mandatory for the valid publication of new scientific names of fungi, at all ranks, were prepared (Hawksworth et al. 2010), and debated at the 9th International Mycological Congress in Edinburgh in 2010. These were put before the Nomenclature Section of the 18th International Botanical Congress in Melbourne in July 2011, and incorporated into the International Code of Nomenclature for algae, fungi, and plants (McNeill et al. 2012). MycoBank can do much more than complete the basic requirements of the Code, but the only mandatory elements are the: name; rank; authorship; bibliographic details of the anticipated place of publication; diagnosis (or description) for names of novel taxa (which from 1 January 2012 can be in English or Latin); full bibliographic details of the basionym or replaced name for new combinations, names at new ranks, or replacement names; and for names of novel taxa also details of the name-bearing type and the institution or other place in which it is permanently preserved.

Although MycoBank was initially set up by CBS-KNAW staff in close collaboration with Index Fungorum, in 2009 it was decided that the ownership of the MycoBank system, database and website should be transferred to the International Mycological Association (IMA). In 2010 a new version of the MycoBank website was launched, based on the BioloMICS software (Robert et al. 2011). The advantage of the latter software is that the structure of the database could evolve according to the needs identified by the endusers and the curators of MycoBank. The new BioloMICS-based version of MycoBank has been regularly updated and improved since then. In this article, we present the major developments achieved during the past four years, as well as some usage statistics of the MycoBank system. In the last section, we will briefly describe how we see the database evolving in coming years.

New Developments

Infrastructure

The latest version of the MycoBank software was released in April 2012, allowing curators to create new tables and fields according to the natural evolution of their increasing needs and the one of the end-users, without the intervention of any software developers. This is essential when new types of data and the associated analytical tools will be incorporated into the system.

In order to ensure a high level of security and availability of the MycoBank website, the whole MycoBank system (software, databases and website) has been transferred to a professional datacentre where power supply, Internet connections and backups are guaranteed.

In order to keep MycoBank users aware of the latest news and improvements related to the database and software, a “News” section was created that can be accessed at https://doi.org/www.mycobank.org/BioloMICSNews.aspx.

Since the MycoBank website offers a large number of features, a “Frequently Asked Questions” and a ”Help” section are now available, providing a number of answers and videos associated with commonly asked questions (these features are available under the Help button on the main menu).

Queries

The new software interface was created in order to improve flexibility for queries. Basic (https://doi.org/www.mycobank.org/Biolomics.aspx?Table=Mycobank) and advanced queries (https://doi.org/www.mycobank.org/Biolomics.aspx?Table=Mycobank_Advanced) are now possible. Advanced users can build complex Boolean queries by combining AND, OR and NOT together with brackets. This makes it possible to ask a question such as “find all Candida species published after 1990 by Kurtzman and not by Fell”. This query will look like this: (Taxon contains Candida) AND (Publication date is after 1990) AND (Authors contains Kurtzman) AND NOT (Authors contains Fell). Results of queries are displayed as lists that can be exported to MS-Excel sheets or MS-Word documents.

In addition to the main taxonomic database, we have also added a bibliographic query system (https://doi.org/www.mycobank.org/Biolomics.aspx?Table=Mycobank%20literature) as well as a thesaurus of terms commonly used in mycology (https://doi.org/www.mycobank.org/Biolomics.aspx?Table=Thesaurus).

Name registration

The interface for the registration of the scientific names of new taxa, and new names, has also been redesigned and simplified, with fewer required steps than the previous version (Fig. 1). Popup windows are presented to depositors in order to facilitate data entries such as links to existing bibliographic records, country name, or higher taxonomic ranks.

Fig. 1
figure 1

Main form for the deposit of a new name. Depositors are usually entering data related to their new taxa in 3–4 min on average per entry.

It is recommended in the Code (McNeill et al. 2012: Rec. 42A.1) that registration numbers are obtained only “after a work is accepted for publication”. That is a wise precaution as during review it sometimes becomes necessary to change the chosen names.

Type registration

During the nomenclature discussion sessions at the 9th International Mycological Congress in Edinburgh (IMC9), the wish was expressed that MycoBank should start capturing typification events, as these are difficult to trace in the literature. Furthermore, without a clear overview of typification events, different authors might easily designate lectotypes, epitypes or neotypes the same name, which would be unfortunate and could lead to the same name being applied in different senses. We strongly support this suggestion, and anticipate that proposals to make the registration of later typifications mandatory will be made to the Nomenclature Section of the 19th International Botanical Congress in 2017.

In the summer of 2013, a new typification registration system was thus added to MycoBank. Mycologists can now log in to the system, and choose to register a type specimen for an existing taxon (this new option has been added directly below the normal “register new name” option, which delivers MB numbers). It means that mycologists can now get “MBT” numbers (MycoBank Typification numbers) for the designation of lectotypes, epitypes, and neotypes. However, if a novel combination or new name is linked to a typification event, a normal MB number would suffice, as the mycologist can directly indicate during the registration process of a combination or new name that the typification event is based on an epitype, neotype, or holotype specimen.

MBT numbers are most appropriately cited in typification sections of papers as follows:

Type. Italy: Padua, on withering leaves of Hedera helix, July 1875, L. Ranger (L-lectotype designated here, MBT12345); Sardegna, Cologne near Oleina, leaf litter of H. helix, 31 Aug. 1970, I. Hulk (CBS H-16992 — epitype designated here MBT176244, culture ex-epitype CBS 937.70).

Multi-lingual system

A major complaint of some users of the earlier version of MycoBank was that it was only available in English thus practically excluding people having difficulties with this language. For this reason, the software was modified to allow multiple languages to be displayed, and we contacted native Arabic, Chinese, Dutch, French, German, Portuguese, Spanish and Thai mycologists to translate the standard text (Fig. 2). Additional languages will be added as required by the community. Japanese and Russian will be added in 2014.

Fig. 2
figure 2

MycoBank is now multilingual and offers the possibility to access the information in several major languages. Myco-Bank homepage displayed here in Arabic.

Forum

Since the Amsterdam declaration on fungal nomenclature (Hawksworth et al. 2011), and the introduction of the new Code (McNeill et al. 2012), mycologists have several new challenges to face reaching consensus with regard to the “one fungus one name” nomenclature (Hawksworth 2011, Wingfield et al. 2012). Two years ago, when discussions were initiated, we felt that there was a need to create a discussion forum to exchange ideas about dual nomenclature, and the name that should be retained. Hence, the Forum option was created and a large number of topics and discussions were initiated (Fig. 3).

Fig. 3
figure 3

MycoBank Forum.

Annotations and remote curation

Like many working databases, MycoBank is incomplete and contains errors and omissions that requires continuous updates by curators. However, it is virtually impossible for the small team of MycoBank curators to sustain such a huge task. The annotation system was therefore created to allow users (after a registration open to anyone) to post comments, suggest corrections or propose new data associated with already deposited taxa. Curators can then accept, reject or simply leave the comments as pending (Fig. 4). It is not, however, the role of Curators to impose a particular taxonomy as differences in scientific opinion have to be accommodated.

Fig. 4
figure 4

The MycoBank annotation system.

The same reasons that led us to include an open annotation system, led to a request for help from additional Curators. In April 2014 and to achieve this goal, remote curation using the Citrix XenApp software will allow volunteer specialists (approved as curators by the MycoBank Advisory Board) to manage sections of the database related to their competences. The first workshop for new Curators will be given at CBS in Utrecht on Saturday 26 April 2014, with a further session planned at the International Mycological Congress (IMC10) in Bangkok.

Web services and central system for registration of fungal names

Many users and websites are interested to obtain data in batches and incorporate this in their own databases. Since MycoBank is a public database used by many other repositories, it was important to provide a number of web services that can be consumed by third party machines. We therefore created several dynamic web services that can easily be changed or adapted if needed.

One year ago, one additional mycological taxon name registration website was established (Fungal Names — FN in China at https://doi.org/fungalinfo.im.ac.cn/fungalname/fungalname.html), in collaboration with the long established Index Fungorum website, which also provides the option for online registration (Index Fungorum — IF in the UK at https://doi.org/www.indexfungorum.org). The International Commission on the Taxonomy of Fungi (ICTF) and the Nomenclature Committee for Fungi (NCF) suggested that the three registrars should synchronize their data and MycoBank was asked to create a central web service that would provide unique numbers to the three systems and exchange data among them. The system was released in June 2013, and IF and FN are currently implementing the needed changes to their system in order to have a fully synchronized system.

Links to third parties

Many other websites are rich resources that can be associated with fungal names available in the MycoBank system. Structural links to the following websites have been created: Catalogue of Life (CoL), Encyclopedia of Life (EOL), Global Biodiversity Information Facility (GBIF), Index Fungorum (IF), Integrated Taxonomic Information System (ITIS), Google Scholar, PubMed, Google, Wikimedia, Wikipedia, Wikispecies, BOLD Systems, EMBL, NCBI, All Russian Collection of Microorganisms (VKM), and CBS collection and StrainInfo. More ad hoc links are also available for some taxa.

Identification services

MycoBank is not only a repository of data associated with fungal names and vouchers, but also offers unique online pairwise sequence identification services (https://doi.org/www.mycobank.org/biolomicssequences.aspx) against curated databases such as Q-bank (https://doi.org/www.q-bank.eu), CBS collections websites (https://doi.org/www.cbs.knaw.nl, Fusarium, dermatophytes, indoor fungi, Calonectria, Yeasts, etc), Fungal Barcoding (https://doi.org/www.fungalbarcoding.org), EUBOLD system (https://doi.org/www.eubold.org), ISHAM ITS Database for Human and Animal Pathogenic Fungi (https://doi.org/www.mycologylab.org) or UNITE (https://doi.org/unite.ut.ee). NCBI/Genbank databases can also be used to perform pairwise sequence alignments. Users interested in identifying unknown sequences can compare them against all the wanted reference databases at the same time or separately and results are gathered centrally and proposed as a unique matching list.

Other more advanced identification services are also possible using a combination of morphological, physiological and/or molecular data (see https://doi.org/www.mycobank.org/Defaultlnfo.aspx?Page=polyphasicID).

Statistics

In total 254,120 unique visitors have visited the English version of MycoBank between April 2012 and 3 December 2013. In December 2012 we launched several language versions of the website, French (3992 unique visitors), Arabic (2466), Chinese (1953), Dutch (1079), German (1828), Portuguese (2141) and Spanish (2207). Recently, a Thai language version has been introduced.

On an average day, 1872 unique users visit one of the MycoBank portals, while the average visit duration is between 6–10 min per user.

The MycoBank user-base is truly global: 13.65 % of the users are located in the USA, but people from 205 countries have used MycoBank since April 2012. Table 1 lists the top 10 countries using of MycoBank around the world.

Table 1 Top 10 countries using MycoBank.

Researchers depositing new scientific names in MycoBank, interested in forum discussions or willing to annotate taxon records have to be registered in MycoBank. Presently 5680 profiles have been registered since MycoBank was initiated in 2004. During the period between 1985 and 2012, 8031 different taxonomists published at least one new fungal species. The average number of authors was 1.86/species. The first 50 authors contributed to 22.9 % of the new species. The first 100 authors contributed to 32.1 % of the new species. The first 1000 authors contributed to 74.3 % of the new species and 6077 authors published between 1 and 5 new species only. One hundred and seventeen authors published more than 100 new species and during this period.

The evolution of the number of newly described species between 1759 and 2009 can be seen in Fig. 5. The number of new species grew constantly (except during the World War II period) despite the reduced number of fungal taxonomists. This is likely due to new technologies allowing mycologists to better distinguish specimens and cultures and therefore separate species, and new techniques permitting them to process and handle larger numbers of specimens. Between 2003 and 2012, the number of newly described species varied from 1692 in 2005 to 3541 in 2012 (2436, 2450, 1692, 1868, 2271, 2391, 2724, 2155, 2374, and 3541).

Fig. 5
figure 5

Decennial evolution of the number of described species between 1759 and 2009.

A more detailed analysis of changing patterns over time in the description of new fungal species will be presented elsewhere.

Future

MycoBank is one of the three repositories that fill an important requirement in terms of the registration of scientific names now required by the Code. While it is increasingly becoming a rich source of knowledge at species, genus, family, and higher levels, the databases of the International Nucleotide Sequence Database Collaboration (INSDC), a consortium consisting of NCBI, EMBL and DDBJ, serves as the international repository for molecular sequences. The task of linking MycoBank entries based on reference material (specimens and strains) to INSDC sequences, often only known from environmental sequences, is a real challenge. It incorporates subjective taxonomic interpretations with many species described and circumscribed on the basis of non-molecular criteria (morphology, physiology, ecology, etc.). Voucher data annotated consistently in all databases will possibly remains the most effective way to link species names and their associated molecular data. The Darwin Core standard (https://doi.org/rs.tdwg.org/dwc/) has partly been proposed and is one way to standardize the formulation of such data. The links between species (and subsequently to higher taxonomic ranks) and sequences can only be done via strains and specimens. The biological repositories of fungal voucher data, culture collections and herbaria (listed in Index Herbariorum) are of major importance by housing reference material, information and strains (Durães Sette et al. 2013). Other initiatives such as the barcoding of life (BOLD systems, EUBOLD, China BOLD, etc.) or the UNITE database are providing useful links between reference material and barcoding sequences. Some projects are dealing with the establishment of reference databases in specific fields such as medical mycology (“DNA barcoding of pathogenic fungi as the basis for the development of novel standardized diagnostic tools”, W. Meyer, V. Robert, D. Ellis & S. Chen”, Australian NH&MRC grant). The Straininfo and WDCM databases are gathering strain data from culture collections and are providing links to INSDC databases, but their scope is limited to cultivable strains and it is now commonly accepted that most of the diversity is not present in culture collections or museums but is simply unknown (Hawksworth 2001, Kirk et al. 2008, Blackwell 2011, Mora et al. 2011). Projects to digitize herbaria will provide additional information. One recent example is the Mycoportal funded by the US National Science Foundation. There still remain many research collections around the world with useful information of unique strains or specimens, but these are often unavailable to third parties. MycoBank also maintains an ex-type strain and specimen database that is linked to species descriptions, which is in the process of being linking to INSDC-based sequences, in order to objectify or at least to provide a molecular background to species circumscriptions. Hence, the co-authors of this paper as well as other prominent mycologists and institutions are actively working on this matter and are preparing workshops, guidelines and tools to better fill the gap of linking sequences to species. One of the ways to solve this problem is to suggest MycoBank depositors of new species to provide molecular data, in addition to strains or specimens, or links to these data during the registration process. It is common knowledge that the voluntary deposit of additional data is a burden to many researchers, but it must be remembered that it is not mandatory, even though the options to deposit extra information appear on the input screens. On the other hand reliable, openly available data from databases and associated websites is a cornerstone of scientific progress. There are several ways to obtain data for reference databases. The first one is voluntary submission but as already mentioned, this approach is only partly successful. The second one is by incitements such as funding, increased citations and improved visibility facilitated by providing researchers free, useful software and database related tools. The third one is by enforcement and using new rules to be established by official bodies such as the International Code of Nomenclature for algae, fungi, and plants, by journals or reference databases. A combination of the three options may be needed to achieve the goal of a reliable taxonomy based on molecular data linked to accessible strains and specimens, and not only on phenotypic criteria.

Linking species data via molecular data using strains and specimens is important, but will not solve all problems or opportunities induced by the usage of modern technologies. Next Generation Sequencing (NGS) methods or high throughput screening technologies already allow us to obtain large datasets that would not be accessible using traditional sampling, isolation and collecting methods. New species are traditionally based on the isolation of one, or ideally several specimens that are studied and deposited in reference collections. With NGS it is possible to obtain millions of sequences from a single soil sample in a few hours and get an idea about the relative abundance of the taxa present. It is also possible and relatively easy to monitor the changes in ecosystems or hosts over time. The known diversity constitutes only a small fraction of the real fungal biodiversity (Hawksworth 2001, Kirk et al. 2008, Blackwell 2011, Mora et al. 2011). Given the drastic reduction of taxonomists and financial support attributed to systematics, it is unlikely that traditional taxonomic approaches will ever allow us to get a near complete idea of the scope of microbial diversity. Therefore, ignoring the impact of new technologies such NGS for the discovery of existing diversity would be a major mistake. Currently, there are no mechanisms allowing researchers to record, share and describe new taxa on the basis of such new technologies, other than the recently proposed system of UNITE (Kõljalg et al. 20131). Although there are a number of issues associated with these new technologies in terms of data quality, reproducibility and quantity, there is no definitive reason to ignore them. Hence, MycoBank, in collaboration with INSDC, UNITE and other DNA Barcoding initiatives (in its broad definition) will propose mechanisms and tools to record non-specimen based descriptions for candidates species (Taylor 2011). We are currently working on tools for the semi-automated curation of large datasets, for fast and accurate assignments to species or candidate species. Given the amounts of data to be handled and analyzed, new technologies need to be developed. This can only be accomplished through the collaboration of several groups of experts ranging from ecologists, taxonomists, molecular researchers, bioinformaticians, informaticians, mathematicians, database specialists to technologists focused in molecular or information technologies and hardware devices such as CPUs, GPUs, or FPGAs.