MassBank (Horai et al. 2010) is a public repository for metabolomics and lipidomics research containing mass spectra from small chemical compounds (<3000 Da). The database comprises a collection of MSn spectra of primary metabolites, flavonoids, gibberellins, saponins, carotenoids, oligosaccharides, and phospholipids. Lipids came into the scope of MassBank in 2008, as they became the official database of the Mass Spectral Society in Japan. Spectra are acquired under various pre-separation techniques, ion sources, MS instrumentations, ion modes, and varying collision energies. The reason for this heterogeneous data collection lies in the fact that the observed fragment ions and intensities thereof vary highly depending on these setup parameters (Hopley et al. 2008), particularly if ESI is used as ion source. The stored spectra are experimentally error-free and noise-free measurements of single chemical compounds, and as such they are a high-quality source. Compound identification is realized by spectral similarity search based on a modified version of the weighted cosine correlation algorithm (Stein and Scott 1994); the output of a search is a list of potential compounds for each spectrum sorted by their similarity score. However, purely matching spectra based on similarity scores usually results in many false positives, since fragment intensities vary highly based on the MS experimental setup. In order to improve the positive predictive value of the identification, MassBank stores in addition to conventional MSn spectra so-called merged spectra which are aggregated spectra comprising spectra of the same compound recorded at different collision energies (usually at 10V, 20V, 30V, 40V, and 50V).
MassBank provides two central access points (http://massbank.jp and http://massbank.eu), whereas the data is distributed over several servers which are located mainly in Japan, two are in the European Union, and one is in China. The implementation as distributed service is one of the pillars in the design of MassBank. By this design, interested parties can easily join the MassBank network and make their data available to other researchers. They can operate local private MassBank instances which are particularly interesting for companies where the data must not be publicly available, and the infrastructure expenses are within calculable limits for the MassBank providers. Furthermore, merely everybody can contribute data to MassBank as long as data were generated by up-to-date analytical methods and data were translated in “MassBank Record Format” which can be done by the “MassBank Tools for Contributors.” Providers of ESI data are asked to provide their data under a few different collision energies in positive and negative ion modes. Contributors shall ideally deposit their data at their local database instances which guarantees for clearly distinguished data quality to other contributors if desired; nevertheless, their data remain still part of a bigger conglomerate.
MassBank Record Format
In order to guarantee a consistent data representation, novel entries must be prepared in the MassBank record format, which consists of summary, a chemical compound, a biological sample, an analytical and a spectral section, and a chemical drawing in Molfile format. The summary section contains general information such as authors, date, and copyright. The chemical section stores information about the measured chemical compound such as the name, compound class, chemical formula, mass, SMILES (Weininger 1988) and InChI codes (http://www.iupac.org/home/publications/e-resources/inchi.html), and links to external databases. In the analytical section, information about mass spectrometrical setup and chromatography is stored. The biological sample section contains information about the biological species and the sample preparation. In the spectral section, the spectrum and peak annotations are stored. A detailed and up-to-date specification of the latest format can be found at the project’s homepage (http://massbank.jp). In order to ease the data submission process, MassBank provides the “MassBank Tools for Contributors.”
MassBank Tools for Contributors
The first step to prepare data for MassBank is to convert the spectra into a suitable format for MassBank. Vendor-specific binary data can be converted by the Java application Mass++ (Tanaka et al. 2014) or, if they are in mzML format, by RMassBank (Stravs et al. 2013); both of these applications are not provided directly by MassBank. The output of these programs is combined with the chemical structure in Molfile format by Excel Macros provided by Record Editor. Additionally, the user has to enter some information manually in the Record Editor, e.g., the summary section. The Record Editor generates files in the “MassBank Record Format” which can be uploaded to the database and managed by the Administration Tool, which is a Java web application in three-tier architecture – in short: accessible by web browser, application logic on Tomcat web server (http://tomcat.apache.org), and data stored in MySQL (http://www.mysql.com) database. Furthermore, the MassBank developers provide detailed step-by-step user guides, such as for the setup of a local MassBank instance, preparing a MassBank record, and the Administration Tool.
MassBank Tools for Users
A spectrum similarity search applet for comparing experimental spectra to the ones stored in MassBank
A batch service for processing large datasets for the similarity search
A quick search to find compounds by their chemical names, chemical formulas, or simple spectral input (m/z-intensity pairs)
A peak search for retrieving spectra exhibiting certain m/z values or m/z differences, whereupon the search can be performed by m/z values or chemical formulas
Data browsing by hierarchies defined by contributors or record indices sorted by various criteria such as instrument types
A search by chemical substructures
A prediction of metabolites of unknown compounds which works basically on predicted peak-substructure relationships or on annotated neutral loss fragments, where the putative substructures are searched against KEGG (Kanehisa et al. 2010) and KNApSAcK (Shinbo et al. 2006)
- Hopley C, Bristow T, Lubben A, Simpson A, Bull E, Klagkou K, Herniman J, Langley J. Towards a universal product ion mass spectral library – reproducibility of product ion spectra across eleven different mass spectrometers. Rapid Commun Mass Spectrom. 2008;22(12):1779–86. doi:10.1002/rcm.3545.PubMedCrossRefGoogle Scholar
- Horai H, Arita M, Kanaya S, Nihei Y, Ikeda T, Suwa K, Ojima Y, Tanaka K, Tanaka S, Aoshima K, Oda Y, Kakazu Y, Kusano M, Tohge T, Matsuda F, Sawada Y, Hirai MY, Nakanishi H, Ikeda K, Akimoto N, Maoka T, Takahashi H, Ara T, Sakurai N, Suzuki H, Shibata D, Neumann S, Iida T, Tanaka K, Funatsu K, Matsuura F, Soga T, Taguchi R, Saito K, Nishioka T. MassBank: a public repository for sharing mass spectral data for life sciences. J Mass Spectrom. 2010;45(7):703–14. doi:10.1002/jms.1777.PubMedCrossRefGoogle Scholar
- Shinbo Y, Nakamura Y, Altaf-Ul-Amin M, Asahi H, Kurokawa K, Arita M, Saito K, Ohta D, Shibata D, Kanaya S. KNApSAcK: a comprehensive species-metabolite relationship database. In: Saito K, Dixon R, Willmitzer L, editors. Plant Metabolomics. 57th ed. Berlin/Heidelberg: Springer; 2006. p. 165–81.CrossRefGoogle Scholar
- Tanaka S, Fujita Y, Parry HE, Yoshizawa AC, Morimoto K, Murase M, Yamada Y, Yao J, Utsunomiya SI, Kajihara S, Fukuda M, Ikawa M, Tabata T, Takahashi K, Aoshima K, Nihei Y, Nishioka T, Oda Y, Tanaka K. Mass++: a visualization and analysis tool for mass spectrometry. J Proteome Res. 2014;13(8):3846–53. doi:10.1021/pr500155z.CrossRefGoogle Scholar