The Structural Biology Knowledgebase: a portal to protein structures, sequences, functions, and methods
- 1.5k Downloads
The Protein Structure Initiative’s Structural Biology Knowledgebase (SBKB, URL: http://sbkb.org) is an open web resource designed to turn the products of the structural genomics and structural biology efforts into knowledge that can be used by the biological community to understand living systems and disease. Here we will present examples on how to use the SBKB to enable biological research. For example, a protein sequence or Protein Data Bank (PDB) structure ID search will provide a list of related protein structures in the PDB, associated biological descriptions (annotations), homology models, structural genomics protein target status, experimental protocols, and the ability to order available DNA clones from the PSI:Biology-Materials Repository. A text search will find publication and technology reports resulting from the PSI’s high-throughput research efforts. Web tools that aid in research, including a system that accepts protein structure requests from the community, will also be described. Created in collaboration with the Nature Publishing Group, the Structural Biology Knowledgebase monthly update also provides a research library, editorials about new research advances, news, and an events calendar to present a broader view of structural genomics and structural biology.
KeywordsProtein Protein production Structural biology Structural databases Structural genomics Theoretical models
Common Gateway Interface
PSI Center for Structures of Membrane Proteins
PSI Joint Center for Molecular Modeling
PSI Joint Center for Structural Genomics
PSI Midwest Center for Structural Genomics
PSI Northeast Structural Genomics Consortium
PSI New Methods in High-Resolution Comparative Modeling
National Institute of General Medical Sciences
Nature Publishing Group
PSI New York SGX Research Center for Structural Genomics
Protein expression purification and crystallization database
Protein Data Bank
Protein Model Portal
Protein Structure Initiative
Really Simple Syndication
Structural Biology Knowledgebase
Simple Object Access Protocol
Web Services Description Language
When the Protein Structure Initiative (PSI) began in 2000, project results were communicated independently through publications , the Protein Data Bank (PDB) [2, 3] structure depositions, individual PSI websites, and presentations at meetings. The PSI-1 and -2 centers had focused their efforts on rapidly determining the structures of proteins on a genomic scale with the emphasis on covering sequence-structure space. To accomplish this, the centers developed a wide variety of tools to carefully select targets and many new technologies to determine and annotate the structures. However, there was no centralized access point for this information. It became clear that in order for the biological community to get the maximum benefit from the products of the PSI effort, a single centralized website was needed. In February 2008, the Structural Genomics Knowledgebase  was created which combined the PSI products with data from publicly available biological resources in order to show comprehensive information, including experimental data prior to publication, to jumpstart biological research. In September 2008, the Structural Genomics Knowledgebase entered into a collaboration with the Nature Publishing Group (NPG) to become a “Gateway” site, delivering editorial content about the latest structure and technology reports, a research library, events calendar and science news in addition to the searchable protein database.
In keeping with the recent start of the third phase of the Protein Structure Initiative, PSI:Biology, the website changed its name in August 2010 to the Structural Biology Knowledgebase (SBKB, URL http://sbkb.org), and developed more tools to aid in protein research design. It also improved features to foster collaborations between the biological community and the PSI. Editorial content, displayed on the SBKB as the “Structural Biology Update”, is updated on the third Thursday of each month, with database updates occurring on a weekly basis. In this article, we describe all of the features available on the SBKB that can be used to enable biological research, and present some examples of its use.
Navigating the SBKB
The SBKB homepage provides entry points to the following content and functionalities:
Left navigation menu
These menu items give facile access to topics relevant to the biological and biomedical research, such as resources for target, structure, methods, models, and publication information. It also contains menu items for PSI and SBKB information such as FAQs, classroom tutorials, PSI Policies and Reports, links to PSI administrative and Center sites, PSI funding opportunities, and links to NPG online resources.
Structural biology update
This section of the SBKB provides a collection of recent research and technical highlights from the PSI and broader structural biology community, news, upcoming events, and a research library of PSI and other structural biology articles provided by the NPG.
This highlight  consists of simplified explanations and illustrations of interesting PSI protein structures with molecular graphics that allow the user to interact and learn about the molecule’s biological role. A new molecule is selected each month.
A central search box queries the SBKB database for all sequence, structure, functional or technological data related to a sequence, PDB ID code or text string. Text searches will also return matching highlights published as part of the Structural Biology Update.
Nature e-alerts and RSS feeds
Users can subscribe to a monthly electronic Table of Contents alert service (e-alert) on Nature.com with links to the SBKB’s latest content. Alternatively, two RSS (Really Simple Syndication) feeds, which can be managed by a user’s web browser, keeps readers apprised of (a) the month’s Structural Biology Update content and (b) the latest PSI structures released by the PDB.
Community-Nominated Targets portal
The PSI:Biology Network invites the biological community to nominate proteins of biological relevance for structural determination. This proposal system begins the process of matching your project with one of 13 high-throughput and membrane protein structure determination centers to carry out the study. Access is available from either the SBKB homepage or at http://cnt.sbkb.org/CNT.
Sequence Comparison and Analysis tool
The Sequence Comparison and Analysis (SCA) tool consists of the same functionality as the Community-Nominated Target proposal system described previously, but provides an evaluation report to the author only rather than forwarding it to a selection committee. It also supports batch submission to evaluate 100s of sequences. This tool can also be found at http://cnt.sbkb.org/CNT.
Functional Sleuth enables further research for proteins in the Protein Data Bank archive whose functions are unknown or minimally characterized. These “structures of unknown function” (SUFs) are currently organized by source organism, and users can choose to display the SUFs from any phylogenetic level from domain to species. Making a selection on a “tree-of-life” image will launch an interactive tree browser which will further filter your list of SUFs, and right clicking on the name will display the gallery of protein structures. You can download a comma-separated-variable (.csv) file of PDB IDs from your particular gallery, and a full list of all SUF PDB IDs is available at http://sbkb.org/KB/unkstrucs.txt. This feature is updated weekly in conjunction with the weekly PDB release.
Latest PSI statistics
The PSI Network tracks its measure of success and progress though a series of agreed metrics. These metrics, including statistics such as the number of protein structures solved and calculated modeling leverage, were defined by the Goals and Metrics Committee for the PSI-2 Network.
BioSync provides technical details about structural biology beamlines at synchrotron radiation facilities. Originally maintained by the RCSB PDB, it is now a part of the SBKB. Progress on future facilities is tracked and information on decommissioned sites is maintained for historical purposes. Links are also provided to related external resources. Summary statistics, based on PDB depositions, are produced and updated weekly. At the beamline level, galleries of structures, tables of citations and general information are also available. Separate statistics are provided for structures solved by structural genomics efforts. This site can be found on the methods hub page from the left navigation menu.
Tutorial and educational resources
The SBKB also has tutorials and classroom exercises to explain how to use the SBKB. These tutorials not only introduce new users to the SBKB, but also give lessons on how to interpret the data being presented. See http://sbkb.org/about/getting_started.html or contact us at email@example.com for more specific requests.
Content in the Structural Biology Knowledgebase
List of web addresses to access the features and underlying portals of the Structural Biology Knowledgebase
SBKB and portal sites
Structural Biology Knowledgebase
Query and reporting; editorials and news
Protein sequence selection and progress
Target trial information and protocols
Protein Model Portal
PSI Technology Portal
PSI Publications Portal
X-ray crystallography methods
Community-Nominated Target proposal system
Community requests for protein structures
Searching the SBKB
We describe the resources used in SBKB query and reporting mechanism in the context of commonly used examples.
Finding sequence-level or structure-level information about a protein of interest
Conducting a sequence or PDB ID search of the SBKB will yield the following:
Links to information from publicly available biological resources
Visualizing protein structures in an interactive viewer
If a 3D structure exists for a protein of interest, the SBKB currently utilizes the molecular viewer FirstGlance  to explore the molecule and binding partners (other proteins, ligands, nucleic acids, etc.). This Java-based viewer also provides explanations of the images and representations for the novice user, with display options (such as “color by B-factors/uncertainty”) for advanced users as well.
Experimental target tracking databases, TargetDB and PepcDB
The SBKB manages two databases that track structural genomics efforts. TargetDB  tracks information on over a quarter million protein sequences, or “targets”, that have been selected for structural determination by worldwide structural genomics projects. This information includes sequence and site information, target experimental status, and timestamps for the latest experimental step (cloned, expressed, purified, etc.). The Protein Expression, Purification, and Crystallization database , PepcDB, provides more detailed information about PSI targets registered in TargetDB (275,000 as of April 2011) by reporting on progress for each experimental trial. Information in PepcDB includes the sequence and site information, rationale for the target’s selection (biomedical, community-nominated, technology development, PSI:Biology partnership selection, etc.), each target’s experimental histories including individual trial details, the protocols used for protein production and structure determination, and reasons why work was terminated if a 3D structure was not determined. TargetDB and PepcDB are updated weekly with data provided by the PSI centers, and are searchable by TargetID or other popular accession ID (UniProt , GenBank , PDB , see site for details), or filter results by the discussed data attributes. Regular checks are made to ensure that these databases are consistent with relevant information in the PDB and the PSI:Biology-Materials Repository (psimr.asu.edu).
In 2011, a new target history tracking resource that merges TargetDB and PepcDB will be released. Please visit the TargetDB and PepcDB websites for more information on this transition.
Links to the PSI:Biology-Materials Repository
The SBKB also searches its partner PSI resource center, the PSI:Biology-Materials Repository, which stores, maintains and distributes protein expression plasmids and vectors created by the PSI centers. It currently holds over 40,000 PSI plasmids and nearly 100 empty vectors available for request with an additional PSI plasmids added on a monthly basis.
If a sequence search yielded no experimental structures for the protein of interest, there are other sources of information to help design future experiments:
Theoretical models from the Protein Model Portal
The current release (Mar 2011) consists of 15.1 million comparative protein models for 3.8 million distinct UniProt sequences. Queries can be entered as protein sequences in one-letter code or UniProt accession IDs. Results are presented in a graphical and intuitive way, indicating regions of the proteins where structural information (experimental or theoretical) is available, complemented with functional and domain annotation. Detailed technical information about each model is also provided, such as the date of creation and “latest verification”, the template structure and the target-template alignment the model was based on, and the expected model accuracy based on the evolutionary distance between the target and the template. Graphs and images that display and assess model quality and reliability are available, and are an essential component to allow users to select the best available structural information for a specific application.
Community requests to the PSI
Finding new methods and technologies
The PSI centers have developed and utilized many technologies that facilitated structural determination and analysis. A text search of the SBKB allows a user to find these methods.
The PSI Technology Portal
The PSI Technology Portal provides access to over 200 summaries of key PSI technologies with links to the responsible PSI center or related publication. Reports can be queried by text, by PSI center, or by clicking an image of an experimental pipeline diagram. All tech reports can be “forwarded” to display on a user’s social networking sites, such as Facebook, del.icio.us, and others. A YouTube channel to show videos of PSI technologies in action has also been created (www.youtube.com/user/sbkbtech) and a Nature Network discussion forum (network.nature.com/groups/psikb_tech/) was also created as a means of communicating with the user directly.
The PSI Publications Portal
The PSI has published over 1,500 peer-reviewed articles over the past 10 years. These articles are organized by topic (structural or methodological) or by PSI center, and can be searched by author, title, journal, PDB ID, or other attributes. Information presented includes links to the PubMed abstract, related PDB ID, number of times cited, and can also be downloaded in EndNote format. Text searches of the Publications Portal can also be conducted from the central SBKB search box.
Using the Structural Biology update—research library
As part of the Structural Biology update, SBKB and Nature editors add articles published by leading scientific journals that relate to structural biology and structural genomic. Articles are organized into subjects that range from cloning to automated annotation, it should be noted that the articles listed here will require subscription to the relevant journals.
Tools for the Home Lab: a booklet about methods developed by the PSI
The SBKB presented a four-part “PSI-2 Achievements” series from July 2010 to October 2010 detailing advances in the areas of methods, modeling, structures, and outreach. The methods articles from this series summarized the latest strategies developed for protein isolation and structural determination methods. We have assembled these articles into a handy PDF booklet called “Tools for the Home Lab: New Methods from the Protein Structure Initiative”. This book of ideas, which can be downloaded from http://sbkb.org/pdf/PSI-2methods.pdf, can enable any level of wet-bench scientist interested in protein research.
The SBKB uses an architecture that collects IDs and selected annotations into a central portal database that is organized to facilitate queries and the addition of new annotations. Each of the SBKB portals developed their underlying architectures with guidelines for common protocols that would allow for easy data exchange with the central SBKB. With this structure, it is possible to query the information from the SBKB home page or from an individual portal module for more specialized queries.
Remotely accessing the SBKB
The content and functionality of the SBKB may be accessed remotely in a variety of ways. Editorial content may be accessed by linking to SBKB pages that are regularly updated (e.g. http://sbkb.org/update/). The RSS feeds are provided to describe both new editorial content and new structure data. Web services are provide to enable program level access to search features of the SBKB. Remote access is provided via both Common Gateway Interface (CGI) and Simple Object Access Protocol (SOAP) protocols. The SBKB also provides an embeddable widget which can be incorporated into a web page at a remote site. The widget provides a dynamically updated display of new articles, features and structure data on the KB site. The protocol interface details and widget installation instructions are described in http://sbkb.org/about/webservices.html.
With the overarching goal of creating new knowledge about the interrelationships of sequence, structure and function, the Structural Biology Knowledgebase captures and highlights the products of the PSI projects for use by the broader biological communities, creates an information repository and web portal that integrates the products of the PSI with publicly available biological information, and encourages collaborative interactions between the PSI and the biological communities. We welcome feedback from the community; users with questions may contact the SBKB at firstname.lastname@example.org.
The SBKB is a resource center within the Protein Structure Initiative and is supported by grant U01 GM093324-01 from the National Institute of General Medical Sciences.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- 1.Smith TL (ed) (2000) Structural Genomics Supplement Issue. Nat Struct Biol 7(11s):927–994Google Scholar
- 5.Goodsell D (2009) PSI featured molecule series. Available from: http://sbkb.org/KB/structures.jsp
- 6.Reddy P (2004) In: Bidgoli H (ed) The internet encyclopedia, vol 2 G-O. Wiley, Hoboken, NJ, pp 298–310Google Scholar
- 10.The Open Protein Structure Annotation Network (2009). Available from: http://www.topsan.org/
- 11.Binkowski A (2009) Global protein surface survey. Available from: http://gpss.mcsg.anl.gov/
- 12.Fischer M (2009) NESG function annotation server. Available from: http://luna.bioc.columbia.edu/honiglab/nesg/cgi-bin/browse.pl
- 13.Functional Analysis Server at the NYSGXRC (2009). Available from: http://www.nysgxrc.org/functional/
- 47.Huhne R, Koch FT, Suhnel J (2007) A comparative view at comprehensive information resources on three-dimensional structures of biological macro-molecules. Brief Funct Genomic Proteomic 6(3):220–239Google Scholar
- 52.Kiefer F et al (2009) The SWISS-MODEL Repository and associated resources. Nucleic Acids Res 37(Database issue):D387–D392Google Scholar
- 66.Nikolskaya AN et al (2006) PIRSF family classification system for protein functional and evolutionary analysis. Evol Bioinform Online 2:197–209Google Scholar
- 79.NextBio (2009). Available from: http://www.nextbio.com/
- 80.Oxford GlycoProteomics 2-DE database (2009). Available from: http://proteomewww.bioch.ox.ac.uk/2d/2d.html
- 81.Human Cornea 2-DE database (2009). Available from: http://www.cornea-proteomics.com/
- 82.DOSAC-COBS 2D-PAGE database (2009). Available from: http://www.dosac.unipa.it/2d/
- 83.Parasite host cell interaction 2D-PAGE database (2009). Available from: http://www.gram.au.dk/2d/2d.html
- 84.Purkyne Military Medical Academy 2D-PAGE database (2009). Available from: http://www.pmma.pmfhk.cz/2d/2d.html
- 85.Reproduction 2D-PAGE (2009). Available from: http://reprod.njmu.edu.cn/cgi-bin/2d/2d.cgi
- 86.Bini L et al (2009) 2D-PAGE database from the Department of Molecular Biology, University of Siena, Italy. Available from: http://www.bio-mol.unisi.it/2d/2d.html
- 108.Martz E (2009) FirstGlance in Jmol. Available from: http://firstglance.jmol.org
- 113.Framework for Handling PSI-2 Community Nominated Targets (2008). Available from: http://sbkb.org/KB/index1.jsp?pageshow=62