VaProS: a database-integration approach for protein/genome information retrieval

Gojobori, Takashi; Ikeo, Kazuho; Katayama, Yukie; Kawabata, Takeshi; Kinjo, Akira R.; Kinoshita, Kengo; Kwon, Yeondae; Migita, Ohsuke; Mizutani, Hisashi; Muraoka, Masafumi; Nagata, Koji; Omori, Satoshi; Sugawara, Hideaki; Yamada, Daichi; Yura, Kei

doi:10.1007/s10969-016-9211-3

VaProS: a database-integration approach for protein/genome information retrieval

Open access
Published: 23 December 2016

Volume 17, pages 69–81, (2016)
Cite this article

Download PDF

You have full access to this open access article

Journal of Structural and Functional Genomics

VaProS: a database-integration approach for protein/genome information retrieval

Download PDF

Takashi Gojobori^1,2,
Kazuho Ikeo²,
Yukie Katayama³,
Takeshi Kawabata⁴,
Akira R. Kinjo⁴,
Kengo Kinoshita^5,6,
Yeondae Kwon³,
Ohsuke Migita^7,8,
Hisashi Mizutani²,
Masafumi Muraoka²,
Koji Nagata³,
Satoshi Omori⁵,
Hideaki Sugawara²,
Daichi Yamada⁹ &
…
Kei Yura ORCID: orcid.org/0000-0003-0197-5478^2,9

4649 Accesses
3 Citations
2 Altmetric
Explore all metrics

Abstract

Life science research now heavily relies on all sorts of databases for genome sequences, transcription, protein three-dimensional (3D) structures, protein–protein interactions, phenotypes and so forth. The knowledge accumulated by all the omics research is so vast that a computer-aided search of data is now a prerequisite for starting a new study. In addition, a combinatory search throughout these databases has a chance to extract new ideas and new hypotheses that can be examined by wet-lab experiments. By virtually integrating the related databases on the Internet, we have built a new web application that facilitates life science researchers for retrieving experts’ knowledge stored in the databases and for building a new hypothesis of the research target. This web application, named VaProS, puts stress on the interconnection between the functional information of genome sequences and protein 3D structures, such as structural effect of the gene mutation. In this manuscript, we present the notion of VaProS, the databases and tools that can be accessed without any knowledge of database locations and data formats, and the power of search exemplified in quest of the molecular mechanisms of lysosomal storage disease. VaProS can be freely accessed at http://p4d-info.nig.ac.jp/vapros/.

Update on Genomic Databases and Resources at the National Center for Biotechnology Information

Sequence Databases

Databases and Protein Structures

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The advance of the molecular biology has yielded a huge amount of biological data including DNA/RNA/protein sequences [1–3], their expression levels [4], difference in the sequences of individuals [5], three-dimensional (3D) structures of the biomolecules [6], phenotypes of the organisms [7] and so forth. These data have been stored in independent databases located on the Internet and researchers exploit these databases for new knowledge of the target of their study. Database mining facilitates the process of knowledge acquisition and that of building new hypotheses for planning new experiments [8].

The expansion of the data size has been coped with the increase in the size of the storage and with invention of a new algorithm for searching the whole data swiftly. One of the famous examples of the tool for quick search of a database in this field is BLAST [9], a tool to search similar sequences out of the huge nucleotide/amino acid sequence databases. Further expansion of the size of the independent database and the increase in the variety of databases may have enhanced chances for performing novel experiments by extending the scope of hypotheses, yet the lack of technology for integrating different types of databases and of an application for searching the multiple databases have precluded extensive application of this approach. The researchers aiming for an integrated search of different databases should approach the databases one by one, learn how to use each database and obtain information relevant for their studies. The users then integrate the data obtained from different databases by themselves. This process evidently requires tedious labour as well as skills for manipulating data in different formats. Hence, the biggest hurdle that we have to overcome in the current life science activity is the complexity in integrating databases in a way that enables us to come up with novel ideas and hypotheses. Once the up-to-date data is comprehensively integrated, then researchers with an experience in a specific field can start deducing a hypothesis in a data-driven manner.

The effort for integrating the management of different databases has been made by a number of groups [10–14]. Linking data with a common framework is one of the possible approaches, and the Semantic Web technologies are becoming increasingly popular in recent years [15]. While the Semantic Web technologies based on linked open data and ontologies are a promising approach, extremely diverse set of ontologies as well as non-uniform uses of URI (Uniform Resource Identifiers) to describe identical resources by different parties make it difficult to integrate various information resources without extensive manual intervention. Although some efforts have been dedicated to solve these difficulties (e.g., http://identifiers.org), it will take time for the research community to agree on a unified convention.

To overcome these difficulties in a search of multiple databases in the information of life science, we started developing a new type of application that searches databases in different locations simultaneously by a simple search query and displays the result in a simple interface at http://p4d-info.nig.ac.jp/vapros/. We named the application VaProS, VAriation effect of PROtein Structure and function. The name derived from the aim of the application, namely to focus on analysing effects of DNA sequence variations on protein structures and function. VaProS aims to realize an idea of “data cloud”, that is to retrieve data without any knowledge of databases scattered in the Internet.

The idea embedded in VaProS that is different from other general database integration efforts is that VaProS makes much of the relationship among the biological molecules and phenomena. The relationship is governed by the central dogma; hence all the incidents can be described in either gene-centered or protein-centered manner. Phenotypic changes of an organism likely derive from changes in the biological system of the organism, which is sustained by the network of biomolecules and those biomolecules are ultimately encoded in DNA. This flow of information is nothing but the opposite direction of the central dogma, and hence the organization of data and databases in VaProS follows the information flow in the central dogma. Technically, the search results of the variety of databases are inter-connected using the protein as a hub of information. In the following sections, the detail of VaProS and the example of the usage are presented.

Materials and methods

Databases and tools on the internet

Table 1 lists the databases that are integrated on VaProS and the tools that visualize the search results. There are 16 databases and 15 tools of which VaProS are made. The latest information of the databases, namely the version and the size, is given at http://p4d-info.nig.ac.jp/vapros/statistics.html. The integration of the databases took the form of either dynamic link or a data copy from the original site to the local site of VaProS. Ideally, all the databases should be accessed dynamically to avoid time lag of the data and to save the local disk space, but such dynamic access often sacrifices a prompt response to a query. Therefore, we downloaded the part of the data from each database and achieved an optimum response speed. The data update is scheduled once in every six month to keep abreast with the latest data in all the databases. VaProS deals with the data of humans, rats and mice and focuses on phenomena related to humans.

Table 1 Components of VaProS

Full size table

Data integration and data presentation in VaProS

VaProS is unique in the style of data integration. VaProS tries to integrate different databases dynamically and relationship amongst the data in the databases is taken by UniProt accession key. Central dogma guarantees the relationship between biomolecules in the organisms, hence all the phenotypes should basically stem from the perturbation on biomolecules. Therefore, phenomena observed in the organization can be tagged to either DNA or protein. We chose a protein identifier to tag all the other data, because VaProS is aimed for the analysis of protein variation.

UniProtKB [2], GeneCards [12] and Cosmic [13] assume a similar approach for the integration of relevant data. VaProS put stress on a graphical presentation of the search results as found in “Molecular Interactions” and “TagCloud”, and on the analyses on protein 3D structures as found in “hGtoP”, “3D Interaction” and “Natural Ligand Database”.

Search method

VaProS accepts keywords, DNA/protein sequence, EntrezGene ID and UniProtKB accession as a query (Fig. 1). A keyword can be a gene name, a protein name, a ligand (drug) name, a disease (phenotype) name and an identifier found in the databases. Input of the keywords is assisted by a keyword-suggestion function. Incomplete input makes VaProS find a related keywords in the keyword database and it shows a list of candidate words below the query input window. Once the Search button is pressed, VaProS throws the input data to NOREN, an original tool to search for whole IDs in the databases related to the query. NOREN is based on the ID mapping table provided by UniProtKB [2], and BLAST [9]. The result of the query is presented as a list of candidates to the user. The candidate list is categorized into three different types, namely Gene/Protein, Ligand and Phenotype (Fig. 2). The user may select the most relevant element in the list, press “Details (Go)” button and obtain the results of the search done by IDs relevant to the keywords (Fig. 3). The results are presented through the tools tabulated in Table 1. The search results shown by each tool can be opened by clicking the corresponding icon on the left in Fig. 3.

Results and discussion

We explored the current knowledge on lysosomal storage diseases (LSDs) and built a tenable hypothesis as a case study to show the usage of VaProS. The similar analyses can be conducted on different diseases by accessing http://p4d-info.nig.ac.jp/vapros.

Lysosomal storage disease

Lysosomes are subcellular organelles responsible for the physiological turnover of the cell constituents. They contain catabolic enzymes that require a low pH environment for their optimal function. LSDs are a heterogeneous group of more than 50 rare inherited disorders characterized by the accumulation of undigested or partially digested macromolecules (Table 2). LSDs ultimately result in cellular dysfunction and clinical abnormalities. LSDs are caused by deficiencies or defects in enzymes for lysosomes, in proteins necessary for the normal post-translational modification of lysosomal enzymes, in the activator proteins of lysosomal enzymes, and in the proteins important for proper intracellular trafficking between the lysosome and other intracellular compartments. The individual diseases are rare, but LSDs as a group affects many people around the world with a frequency of about one in every 7000–8000 live births [30, 31].

Table 2 Lysosomal storage diseases

Full size table

Search in the first step

A search by a keyword “gangliosidosis”, one of the major groups in LSDs, resulted in six candidates as shown in Fig. 4. As shown in Table 2, gangliosidoses are classified into two types, GM1 and GM2, both of which are further classified into three subtypes. The estimated incidence of GM1-gangliosidosis is 1 per 100,000 to 200,000 births, and those of GM2-gangliosidosis are 1 per 360,000 births for Tay-Sachs disease and 1 per 310,000 or 1,000,000 births for Sandhoff disease. GM2-gangliosidosis AB variant is extremely rare [32]. Each line in the search result happened to correspond to an individual entry of genetic diseases/disorders in OMIM database [7]. The three types in GM1-gangliosidoses (types I, II, and III) were related to the same “Molecule Symbol”, namely GLB1 gene, but Tay-Sachs disease, Sandhoff disease and AB variant in GM2-gangliosidoses were related to HEXA, HEXB and GM2A genes, respectively. Each gene was linked to the databases listed in Table 1. Ticking the far left box in Fig. 4 and pressing “Details (Go)” button on the top led the user to the further detail of the selected item. In the following section, the search result of each tool listed in Table 1 is explained.

Molecular interactions

The interaction network of the proteins encoded in HEXA and HEXB was found in “Molecular Interactions” window (Fig. 5). This window can be displayed by ticking both HEXA and HEXB in the table shown in Fig. 4 and pressing “Details (Go)” button. A protein is represented with a big node and a ligand is represented with a small node. A protein–protein/ligand interaction is represented with an edge. Figure 5 tells that eight proteins and four ligands interact both with HEXA and HEXB, and each protein interacts with a number of other proteins and ligands. These interactions were extracted from different databases listed in Table 1. In Fig. 5, the nodes in red are proteins associated with diseases. The information was extracted from OMIM (Table 1), and the catalog of specific disease is given on the right side of the window. There are two nodes in red that interact with both HEXA and HEXB, which suggest disease–disease interactions. By right clicking a node, protein–protein interactions can be extended. The pathway of two nodes in the window can be automatically detected using “Path Search” on the top menu.

By clicking a node or an edge, the detail information of the node/edge can be displayed on the right bottom of the window. In Fig. 5, HEXA was selected, hence the detail of HEXA was presented on the right. The link to “3D Interaction” shows the protein 3D structural information of HEXA protein. In 3D Interaction, SiteTable/SitesByVariants link leads the users to the information that VaProS aims for, namely the relationship between variations on DNA and protein structure/function.

hGtoP

hGtoP provides relationship between genes and proteins as its name suggested (G in hGtoP stands for Gene/Genome and P for Protein). The original GtoP was developed by Kawabata et al. [24]. VaProS included human specific GtoP as one of the tools. With hGtoP, the structural information of the protein in the query is easily found. In addition, the homologues of the protein in different species can be found. “3D Interaction” and “hGtoP” contain similar information about protein 3D structures, however, the former focuses more on 3D modelling of complex structures, and the latter focuses on comparative genomics. PDB information in Table 2 was obtained by hGtoP and 3D Interaction. The information clarified that some of the proteins in LSDs do not have structural information yet. In other words, Table 2 provides valuable information for structural genomes of LSDs, namely the target proteins for determining 3D structures.

Natural ligand database

Natural Ligand Database (NLDB) [25] provides the model of protein structures with natural ligands. The idea stemmed from the fact that many ligands in PDB are modified ligands for the sake of crystallisation and the bridge between those modified ligands and natural ligands should be provided to enhance the 3D structural information in PDB. The search query “GM1 gangliosidosis” led the user to the causative gene GLB1. NLDB demonstrated that GLB1 was involved in 15 KEGG reactions, and these 15 reactions were classified into five pathways according to “UniProt search view”. The same 15 reactions can be found in “Pathway DB” tool (Table 1). Of the five pathways in NLDB, glycosphingolipid biosynthesis (hsa00604) contained the reaction of GM1 degradation (R06010). In this reaction, 56 natural ligand complexes were registered in NLDB derived from the proteins of various species. The link to human beta-galactosidase, the product of GLB1 gene, with galactose (PDB ID: 3THC) led to the 3D structures of the ligand–protein complex with reported variation in amino acid residues. The variations in amino acid residues around the ligand-binding site were highlighted on the table of NLDB window. In this entry, ten variations were reported around the ligand-binding site, and eight of them were related to diseases, namely three to GM1 type I, one to GM1 type II, one to GM1 type III, and two to mucopolysaccharidosis IV-B.

COXPRESdb

Gene co-expression sometimes sheds light on relationship between genes, and COXPRESdb [26] provides user-friendly interface to gene co-expression information in humans, mice and rats. The search query “GM2 gangliosidosis” on VaProS led the user to a list of causative genes that included HEXA and HEXB. COXPRESdb demonstrated that the co-expression for HEXA (PCC = 0.43) and HEXB (PCC = 0.45), which are known to be related with Sandhoff disease and Tay-Sachs disease, respectively from OMIM information. By following the link to COXPRESdb, the user can also check the co-expression networks of HEXA and HEXB, which led to the finding that ten more lysosomal proteins are tightly co-expressed with them.

Other tools

GNP Expression, Phenotype, Autophagy DB, Genome Explorer and Mutation@A Glance (Table 1) show the search results in a tabulated or graphical form. The user can further analyze the database of each tool by following the link in the search result.

S-VAR is a special tool that evaluates impact of amino acid substitution to the function of the protein. By providing a specific mutation to the window of S-VAR, the tool starts a couple of software that evaluate the impact of the mutation [33–36] and provides each and consensus results for the user at some intervals.

Building hypothesis on the relationship between phenotype and protein 3D structure

Combination of the search results in each tool can be a basis of building a hypothesis that can be verified by wet-lab experiments. The search query “GM1 gangliosidosis” on VaProS led the user to the current knowledge that the causative gene is GLB1, which encodes lysosomal enzyme β-d-galactosidase. By following the link to OMIM [7], user can acquire information on the detail of the disease, namely GM1 gangliosidosis is classified into three types in accordance with the onset age and severity; type I (infantile), type II (juvenile) and type III (adult) as shown in Table 2.

“3D Interaction” summarized the variations leading to each type of diseases on the protein 3D structure (Fig. 7). Visual inspection of the figure tells that the variation tends to be located inside the protein. Indeed, the ratios of buried residues for type I was 95.5%, for type II was 88.2%, and for type III was 86.7%. These ratios are significantly higher than the average of the protein (Table 3). Note that the ratio of buriedness of the variation site is highest in type I and lowest in type III. Generally, mutations on the buried sites often make the protein less stable than the native one. Hence the observation suggests that the variations on type I have more impact on the stability of the protein than those on type III. 3D Interaction also provided amino acid frequencies of homologous proteins. The mutation to the rare amino acid implies that the type of amino acid has not sufficiently fixed during the molecular evolution. The ratios of mutations to the amino acids that no homologues used were 78.4% (type I), 73.7% (type II) and 53.3% (type III). Serious phenotypes are expected by mutations to amino acid types rarely observed in homologues. The buriedness and trends in amino acid types between types I and III apparently correlated with the degree of seriousness in each type of GM1 gangliosidoses. The similar trend was discussed by Ohto et al. at the time they determined the 3D structure of the protein [37]. VaProS enables such complex hypothesis building in a short period of time.

Table 3 Summary of the mutated sites of GLB1 on protein 3D structure for GM1-gangliosidosis

Full size table

Conclusion

Here we launched VaProS, a new type of database integration application. VaProS enables a quick search of multiple databases with interrelation of each search result. This application can be used as a textbook for acquiring expert knowledge for researchers in different fields, and can be a tool for building a data-driven hypothesis that can be tested by wet-lab experiments [16–23].

References

Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW (2013) Nucleic Acids Res 41:D36–D42
Article PubMed CAS Google Scholar
Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R (2016) Nucleic Acids Res 44:D20–D26
Article PubMed Google Scholar
Mashima J, Kodama Y, Kosuge T, Fujisawa T, Katayama T, Nagasaki H, Okuda Y, Kaminuma E, Ogasawara O, Kokubo K, Nakamura Y, Takagi T (2016) Nucleic Acids Res 44:D51–D57
Article PubMed Google Scholar
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A, Lee H, Zhang N, Robertson CL, Serova N, Davis S, Soboleva An (2013) Nucleic Acids Res 41:D991–D995
Article PubMed CAS Google Scholar
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K (2001) Nucleic Acids Res 29:308–311
Article PubMed PubMed Central CAS Google Scholar
Berman H, Henrick K, Nakamura H (2003) Nat Struct Mol Biol 10:980
Article CAS Google Scholar
Online Mendelian Inheritance in Man, OMIM. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University (Baltimore, MD). http://omim.org/
Gewin V (2015) Nature 529:117–119
Article Google Scholar
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL (2009) BMC Bioinform 10:421
Article CAS Google Scholar
Wikipedia (2016). http://en.wikipedia.org/wiki/Wikipedia
NCBI Resource Coordinators (2013) Nucleic Acids Res 41:D8–D20
Article CAS Google Scholar
Stelzer G, Dalah I, Iny Stein T, Satanower Y, Rosen N, Nativ N, Oz-Levi D, Olender T, Belinky F, Bahir I, Krug H, Perco P, Mayer B, Kolker E, Safran M, Lancet D (2011) Hum Genom 5:709–717
Article CAS Google Scholar
Forbes SA, Beare D, Gunasekaran P, Leung K, Bindal N, Boutselakis H, Ding M, Bamford S, Cole C, Ward S, Kok CY, Jia M, De T, Teague JW, Stratton MR, McDermott U, Campbell PJ (2015) Nucleic Acids Res 43:D805–D811
Article PubMed Google Scholar
Katayama T, Wilkinson MD, Micklem G, Kawashima S, Yamaguchi A, Nakao M, Yamamoto Y, Okamoto S, Oouchida K, Chun H-W, Aerts J, Afzal H, Antezana E, Arakawa K, Aranda B, Belleau F, Bolleman J, Raoul Bonnal J, Chapman B, Peter Cock J, Eriksson T, Paul Gordon M, Goto N, Hayashi K, Horn H, Ishiwata R, Kaminuma E, Kasprzyk A, Kawaji H, Kido N, Young Kim J, Kinjo AR, Konishi F, Kwon K-H, Labarga A, Lamprecht A-L, Lin Y, Lindenbaum P, McCarthy L, Morita H, Murakami K, Nagao K, Nishida K, Nishimura K, Nishizawa T, Ogishima S, Ono K, Oshita K, Park K-J, Prins P, Saito TL, Samwald M, Satagopam VP, Shigemoto Y, Smith R, Splendiani A, Sugawara H, Taylor J, Vos RA, Withers D, Yamasaki C, Zmasek CM, Kawamoto S, Okubo K, Asai K, Takagi T (2013) J Biomed Semant 4:1–17
Article Google Scholar
Katayama T, Wilkinson MD, Aoki-Kinoshita KF, Kawashima S, Yamamoto Y, Yamaguchi A, Okamoto S, Kawano S, Kim J-D, Wang Y, Wu H, Kano Y, Ono H, Bono H, Kocbek S, Aerts J, Akune Y, Antezana E, Arakawa K, Aranda B, Baran J, Bolleman J, Bonnal RJ, Buttigieg PL, Campbell MP, Chen Y-a, Chiba H, Cock PJ, Cohen KB, Constantin A, Duck G, Dumontier M, Fujisawa T, Fujiwara T, Goto N, Hoehndorf R, Igarashi Y, Itaya H, Ito M, Iwasaki W, Kalas M, Katoda T, Kim T, Kokubu A, Komiyama Y, Kotera M, Laibe C, Lapp H, Lutteke T, Marshall MS, Mori T, Mori H, Morita M, Murakami K, Nakao M, Narimatsu H, Nishide H, Nishimura Y, Nystrom-Persson J, Ogishima S, Okamura Y, Okuda S, Oshita K, Packer NH, Prins P, Ranzinger R, Rocca-Serra P, Sansone S, Sawaki H, Shin S-H, Splendiani A, Strozzi F, Tadaka S, Toukach P, Uchiyama I, Umezaki M, Vos R, Whetzel PL, Yamada I, Yamasaki C, Yamashita R, York WS, Zmasek CM, Kawamoto S, Takagi T (2014) J Biomed Semant 5:1–13
Article Google Scholar
NCBI Resource Coordinators (2016) Nucleic Acids Res 44:D7–19
Article Google Scholar
The UniProt Consortium (2014) Nucleic Acids Res 43:D204–D212
Article PubMed Central Google Scholar
Chatr-aryamontri A, Breitkreutz BJ, Oughtred R, Boucher L, Heinicke S, Chen D, Stark C, Breitkreutz A, Kolas N, O’Donnell L, Reguly T, Nixon J, Ramage L, Winter A, Sellam A, Chang C, Hirschman J, Theesfeld C, Rust J, Livstone MS, Dolinski K, Tyers M (2015) Nucleic Acids Res 43:D470–D478
Article PubMed Google Scholar
Bento AP, Gaulton A, Hersey A, Bellis LJ, Chambers J, Davies M, Kruger FA, Light Y, Mak L, McGlinchey S, Nowotka M, Papadatos G, Santos R, Overington JP (2014) Nucleic Acids Res 42:D1083–D1090
Article PubMed CAS Google Scholar
Law V, Knox C, Djoumbou Y, Jewison T, Guo AC, Liu Y, Maciejewski A, Arndt D, Wilson M, Neveu V, Tang A, Gabriel G, Ly C, Adamjee S, Dame ZT, Han B, Zhou Y, Wishart DS (2014) Nucleic Acids Res 42:D1091–D1097
Article PubMed CAS Google Scholar
Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, Duesbury M, Dumousseau M, Feuermann M, Hinz U, Jandrasits C, Jimenez RC, Khadake J, Mahadevan U, Masson P, Pedruzzi I, Pfeiffenberger E, Porras P, Raghunath A, Roechert B, Orchard S, Hermjakob H (2011) Nucleic Acids Res 40:D841–D846
Article PubMed PubMed Central CAS Google Scholar
Pratt D, Chen J, Welker D, Rivas R, Pillich R, Rynkov V, Ono K, Miello C, Hicks L, Szalma S, Stojmirovic A, Dobrin R, Braxenthaler M, Kuentzer J, Demchak B, Ideker T (2015) Cell Syst 1:302–305
Article PubMed PubMed Central CAS Google Scholar
Fabregat A, Sidiropoulos K, Garapati P, Gillespie M, Hausmann K, Haw R, Jassal B, Jupe S, Korninger F, McKay S, Matthews L, May B, Milacic M, Rothfels K, Shamovsky V, Webber M, Weiser J, Williams M (2016) Nucleic Acids Res 44:D481–D487
Article PubMed Google Scholar
Kawabata T, Fukuchi S, Homma K, Ota M, Araki J, Ito T, Ichiyoshi N, Nishikawa K (2002) Nucleic Acids Res 30:294–298
Article PubMed PubMed Central CAS Google Scholar
Murakami Y, Omori S, Kinoshita K (2016) J Struct Funct Genomics doi:10.1007/s10969-016-9206-0
PubMed Google Scholar
Okamura Y, Aoki Y, Obayashi T, Tadaka S, Ito S, Narise T, Kinoshita K (2015) Nucleic Acids Res 43:D82–D86
Article PubMed Google Scholar
Hijikata A, Raju R, Keerthikumar S, Ramabadran S, Balakrishnan L, Ramadoss SK, Pandey A, Mohan S, Ohara O (2010) DNA Res 17:197–208
Article PubMed PubMed Central CAS Google Scholar
Kawabata T (2016) J Struct Funct Genom. doi:10.1007/s10969-016-9208-y
Google Scholar
Homma K, Suzuki K, Sugawara H (2011) Nucleic Acids Res 39:D986–D990
Article PubMed CAS Google Scholar
Meikle PJ, Hopwood JJ, Clague AE, Carey WF (1999) J Am Med Assoc 281:249–254
Article CAS Google Scholar
Poorthuis BJ, Wevers RA, Kleijer WJ, Groener JE, de Jong JG, van Weely S, Niezen-Koning KE, van Diggelen OP (1999) Hum Genet 105:151–156
Article PubMed CAS Google Scholar
Brunetti-Pierri N, Scaglia F (2008) Mol Genet Metab 94:391–396
Article PubMed CAS Google Scholar
Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, Kondrashov AS, Sunyaev SR (2010) Nat Methods 7:248–249
Article PubMed PubMed Central CAS Google Scholar
Kumar P, Henikoff S, Ng PC (2009) Nat Protoc 4:1073–1081
Article PubMed CAS Google Scholar
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) PLoS one 7:e46688
Article PubMed PubMed Central CAS Google Scholar
Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daveman R, Diemer K, Muruganujan A, Narechania A (2003) Genome Res 13:2129–2141
Article PubMed PubMed Central CAS Google Scholar
Ohto U, Usui K, Ochi T, Yuki K, Satow Y, Shimizu T (2012) J Biol Chem 287:1801–1812
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This research is supported by the Platform Project for Supporting in Drug Discovery and Life Science Research (Platform for Drug Discovery, Informatics, and Structural Life Science) from Japan Agency for Medical Research and Development (AMED).

Author information

Authors and Affiliations

Computational Bioscience Research Center, Biological and Environmental Sciences and Engineering, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
Takashi Gojobori
National Institute of Genetics, Shizuoka, 411-8540, Mishima, Japan
Takashi Gojobori, Kazuho Ikeo, Hisashi Mizutani, Masafumi Muraoka, Hideaki Sugawara & Kei Yura
Graduate School of Agricultural and Life Sciences, University of Tokyo, Bunkyo, Tokyo, 113-8657, Japan
Yukie Katayama, Yeondae Kwon & Koji Nagata
Institute for Protein Research, Osaka University, Suita, Osaka, 565-0871, Japan
Takeshi Kawabata & Akira R. Kinjo
Graduate School of Information Sciences, Tohoku University, Miyagi, Sendai, 980-8597, Japan
Kengo Kinoshita & Satoshi Omori
Tohoku Medical Megabank Organization, Tohoku University, Miyagi, Sendai, 980-8573, Japan
Kengo Kinoshita
Department of Maternal-Fetal Biology, National Research Institute for Child Health and Development, Setagaya, Tokyo, 157-8535, Japan
Ohsuke Migita
Department of Pediatrics, St. Marianna University School of Medicine, Miyamae, Kawasaki, 216-8511, Japan
Ohsuke Migita
Center for Informational Biology, Ochanomizu University, 2-1-1, Otsuka, Bunkyo, Tokyo, 112-8610, Japan
Daichi Yamada & Kei Yura

Authors

Takashi Gojobori
View author publications
You can also search for this author in PubMed Google Scholar
Kazuho Ikeo
View author publications
You can also search for this author in PubMed Google Scholar
Yukie Katayama
View author publications
You can also search for this author in PubMed Google Scholar
Takeshi Kawabata
View author publications
You can also search for this author in PubMed Google Scholar
Akira R. Kinjo
View author publications
You can also search for this author in PubMed Google Scholar
Kengo Kinoshita
View author publications
You can also search for this author in PubMed Google Scholar
Yeondae Kwon
View author publications
You can also search for this author in PubMed Google Scholar
Ohsuke Migita
View author publications
You can also search for this author in PubMed Google Scholar
Hisashi Mizutani
View author publications
You can also search for this author in PubMed Google Scholar
Masafumi Muraoka
View author publications
You can also search for this author in PubMed Google Scholar
Koji Nagata
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Omori
View author publications
You can also search for this author in PubMed Google Scholar
Hideaki Sugawara
View author publications
You can also search for this author in PubMed Google Scholar
Daichi Yamada
View author publications
You can also search for this author in PubMed Google Scholar
Kei Yura
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kei Yura.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Gojobori, T., Ikeo, K., Katayama, Y. et al. VaProS: a database-integration approach for protein/genome information retrieval. J Struct Funct Genomics 17, 69–81 (2016). https://doi.org/10.1007/s10969-016-9211-3

Download citation

Received: 03 September 2016
Accepted: 05 December 2016
Published: 23 December 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10969-016-9211-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

VaProS: a database-integration approach for protein/genome information retrieval

Abstract

Similar content being viewed by others