A Tour of the PSI-Nature
Structural Genomics Knowledgebase
The PSI-Nature Structural Genomics Knowledgebase (PSI SGKB) is designed to turn the products of the Protein Structure Initiative into knowledge that is important for understanding living systems and disease. This "one-stop shop" provides users with the available genetic, structural, functional and experimental information about a particular protein of interest.
This walkthrough will introduce you to the features and search capabilities of the PSI SGKB.
Navigating the PSI SG Knowledgebase
The PSI SGKB homepage makes many features available from one central place.
Features available on the PSI SGKB homepage
The central search box is the main entry point to find out more information about a protein. You can search by protein or nucleotide sequence, PDB ID (Protein Data Bank atomic 3D coordinates file ID) or also conduct a search by text. These will be described in the second half of this tutorial.
Central view: Structural Genomics Update - a view of available research highlights this month.
Left Navigation Menu
Provides access to the structural genomics update content, information about the PSI, its centers, and its resources.
Right navigation menu
E-alerts: subscribe to Nature's Email alert service or download RSS feeds which lists the monthly content and weekly structure updates.
Functional Sleuth: shows all of the PSI structures solved by the large-scale centers that do not yet have functional annotation information.
Propose Targets: a way for groups outside the PSI to submit targets and benefit from the PSI's high-throughput structure determination pipeline.
Latest PSI Statistics: a count of the protein structures solved by the PSI efforts.
See Latest Structures: a list of the PSI structures released for public use, updated weekly. (Available also as an RSS feed)
A closer look at features to browse
Functional Sleuth
In Functional Sleuth, we present structures determined by the PSI structural genomics efforts whose functions are still unknown. Clicking on a structure in a gallery will perform a query of the Knowledgebase site to provide a starting point to explore each structure.

Propose Targets
The PSI Centers have developed high-throughput protein production and structure determination pipelines and new experimental methods as solutions to a number of experimental bottlenecks. The greater scientific community is invited to benefit from these efforts as well. Each PSI Center entertains target nominations for structure determination, which are vetted for feasibility and consistency with the overall PSI goals. Investigators can create an account and submit their target proposals using this feature on the PSI SGKB, and a decision is usually received within one month. Proposals accepted for structure determination must adhere to the PSI rules, most notably that structural data, including, must be deposited in the public database, the PDB, within 4 weeks of completion of the structure.

See Latest Structures
As a rule, the PSI centers must deposit their protein structures to the Protein Data Bank within 4 weeks of their completion so that these structures can quickly reach the biological communities that use them for clinical and basic studies. The "See Latest Structures" feature shows all structures released that week. This list can also be delivered to your web browser in an RSS feed.

The Structural Genomics Update page
The Structural Genomics Update keeps readers current in advances made by the PSI and in the fields of structural genomics and structural biology. It delivers editorials describing the latest research findings and technical highlights, an Events Calendar, recent articles from Nature News, and a Research Library categorized by experimental topic.
The Search box is also available on the right side of the page; this allows users to search anything else that they see of interest while browsing the articles and sites on the PSI SGKB.

Features available on the Structural Genomics Update site
Research Advances
Editorials about recent protein structures and new techniques/methods are written each month by the Nature Publishing Group, focusing on topics that could be of broader scientific interest to anyone. Research highlights published in various Nature Journals that relate to proteins are also shared here.
Featured Molecule
Each month, the Featured PSI Molecule gives a detailed portrait of a biological molecule solved by the PSI structural genomics efforts. Using interactive illustrations and generalized explanations, each article describes the features of these biologically significant targets for students of all ages.

Research Library
The Research Library is a catalog of all PSI publications to date, in addition to recent structural results and technological advances from the broader structural biology community. Updated monthly, this specialized resource is organized by subject (see below) so that users can find papers related to various solutions to problems in the protein pipeline, such as a new DNA vector for protein expression, to novel NMR or x-ray structure determination methods, to new protein function prediction resources.

News
To present a broader view of the latest in science in general, we provide the latest news from the Protein Structure Initiative, Nature News and other NPG publications. The monthly newsletter, "PSI in the Spotlight", contains PSI and NIGMS news items such as funding announcements, press releases, new (or changes to existing) policies, and conference reports.

Calendar of Events
An events calendar keeps the community in touch with upcoming conferences, events, and workshops that promote a structural view of biology.

Further Information about the Protein Structure Initiative

The PSI SGKB also contains information regarding the PSI program, its mission, and its policies, found in the left navigation menu under "About PSI".
The "About PSI" site also displays active funding opportunities to either become part of the PSI efforts, or to collaborate with current consortia, with links to the NIH/NIGMS announcements and notices.
If you are interested in specific projects and initiatives by PSI Center, the "PSI Center" menu provides links to each center.
Searching the PSI SG Knowledgebase
The PSI-Nature SGKB can be searched by one-letter code protein sequence, nucleotide sequence, plain text and Protein Data Bank identifier (PDB ID) code. The following section describes how to use these search options.
The PSI SGKB consists of a main searchable database linked with modules (PSI resources) that provide additional information about the query terms.

Searchable by sequence and PDB ID:
Experimental Data Tracking databases
TargetDB and PepcDB
Structures from the PDB
Annotations from external biological resources
Protein Model Portal - homology models
Materials Repository - DNA clones
Searchable by text:
Technology Portal - a repository of technical reports and methods provided by the PSI centers, searchable by center and by experimental step.
Publications Portal - a list of all articles published by the PSI centers.
PSI Centers - search text from within the PSI centers websites
Next, we will discuss searching these features in detail.
Searching by Sequence or PDB ID
The PSI-Nature SGKB maintains a database of the sequences of PSI protein targets and the sequences of all solved protein structures released by the Protein Data Bank. Sequence searches are performed using the BLASTP program with an E-value cutoff of 10. To search for a particular protein sequence, enter the one-letter amino acid sequence in the search form, select the by Sequence radio button and press Search. Nucleotide sequence searched are also supported, using the BLASTX program to determine possible reading frames and displaying closely matched protein sequences.
An example query is available by selecting the by Sequence radio button, pressing "example query", and then pressing the Search button. These options are highlighted in the figure below.

The PSI-Nature SGKB maintains a database of the identifier codes for all experimental structure entries released by the Protein Data Bank. To search for a particular Protein Data Bank entry, enter the structure's 4-letter ID code in the search form, select the by PDB id radio button and press Search. An example query (2I9Y) is available on the site to explore these features.
Results of a Sequence or PDB ID Search
The results of sequence and PDB ID searches are first displayed as a summary of available records relating to the input query. An example of a Results Summary is shown below.

To view query result details individually, select the DB REPORT tab at the top of the summary page. From this summary, you can view the type of information you seek:
1. Structures - displays a list of experimental structures within the PDB
2. Annotation - displays known genetic, structural, and functional information for each experimental structure
3. Models - supplied by the Protein Model Portal (http://www.proteinmodelportal.org), displays computational models related to the sequence
4. Targets - supplied by the experimental data tracking (EDT) database, TargetDB, (http://targetdb.pdb.org), displays information on the experimental progress and status of targets selected for structure determination.
5. Protocols - supplied by the EDT database, PepcDB, (http://pepcdb.pdb.org), displays status history, stop conditions, reusable text protocols and contact information collected from the NIH PSI and other structural genomics centers.
6. Materials - supplied by the PSI Materials Repository, (http://www.hip.harvard.edu/PSIMR/) displays DNA clones available for purchase.
The Structures Tab
The Structures tab of the DB Report provides the essentials details about any structures matching the input query. If the query results for a sequence search are displayed, then the percent of sequence identity (percent exact sequence similarity) with the input sequence is displayed for each matching structure entry (I), as well as the E value (E).
The Structures section presents:
- a link to the RCSB PDB Structure Explorer Page,
- a link to a collection of molecular images from the RCSB (when available), and
- a download option for the PDB format structure data file.
Other reference information includes:
- PubMed and DOI for the primary citation (when available),
- Title of the deposited structure (may not be the same as the related publication),
- Authors
- Structure entry deposition and release dates, and
- Experimental method used to obtain the model.
If the structure was solved by a PSI project then this information is provided along with the associated PSI Target identifier. There is also a glossary of terms available in the upper right hand corner which defines these headings. A glossary is present for each tab.

To view the other reports, click on their tab headings (Annotation, Models, etc.)
The Annotations Tab
The Annotation(s) section presents a collection of links containing structural and functional annotations for the matching structure entry.
In the figure of a typical annotation section below, links are provided to the databases PDBSUM (comprehensive protein structure summary), Proteopedia (online protein encyclopedia), Pfam (a protein family and motifs database), InterPro (protein family assignment), and Gene3D (predictive structural annotation).
From this view the user can see what annotation databases have data relating to the sequence, and can go directly to the record by following the link.
The Glossary of Terms, available in the top-right corner, defines these headings; in this case, the glossary describes what kind of information each linked database provides.

The Models Tab
Computational Models associated with a query sequence or structure are shown in this section.
In the case of a sequence query, the number of models that have been predicted for this sequence are presented along with a link to the details for each model. In the case of the PDB ID query, the number of computational models which are based on information from this experimental structure is presented.
All of these results are obtained by a remote query to the PSI Protein Models Portal, which collects and maintains this information. In the example below, there are 5 models from two modeling databases available. To explore, follow the "view" link to go to the PSI Protein Models Portal.
Example: perform a search on the example PDB ID entry, "2I9Y"
Step 1: Once you see the results of your search, follow the "view" link.

Step 2: Since this was an experimentally determined structure, 2I9Y was used as a template to model protein sequences with the UniProt Accession ID Q3E7W3. Since 2I9Y is the structure of protein Q9SSK9, some results include the sequence on other structure “templates”.

The full Model report from the Protein Model Portal is as follows:

The Sequence Summary:
red: your query
blue: the model you are viewing.
this model consists of residues 20-170 of your query sequence.
Domain Annotation:
Reports what protein domains are recognized in your query sequence, with a link to Inter Prot for further information.
Structural Model:
The computation model is presented, with information related to its creation. You can also display an interactive view the model and also download its coordinates for further evaluation.
Target-Template Alignment:
The target-template alignment provided on the model info pages are generated dynamically by structural superposition of model and template structures using the program MAMMOTH
The Targets Tab
Information about matching structural genomics targets is shown in the Targets tab of the DB report.
The information provides the user with a status summary of the work performed on the target already. Information in this summary includes:
- the TargetID, with a link to the record in TargetDB
- the protein sequence alignment between your query sequence and similar sequences found in the database
- reported target status
- source organism
- and PSI Target Category
A Glossary of Terms is available in the top right corner that defines these headings.
You can read the full record by clicking on the TargetID in the report (ex. GO.74365)

The full Targets report from TargetDB is as follows:

General information, such as when the latest update occurred, the responsible center, status information, source organism and target sequence.
If the target's experimental structure was successfully determined, a link to the RCSB PDB Structure Explorer page is also given.
Links to domain annotation and function prediction databases are provided, along with calculated biochemical and biophysical parameters for the sequence.
The Protocols Tab
The Protocols section provides links to the Protein Expression Purification and Crystallization Database (PepcDB).
The information provided in this tab expands upon the information listed in the Targets tab by providing links to the experimental protocols. Information in this summary includes:
- the TargetID, with a link to the record in PepcDB
- the protein sequence alignment between your query sequence and similar sequences found in the database
- links to the protocols used at each step of protein production and structure determination
Each experimental step is a link to a detailed protocol used by the structural genomics center. These protocols can suggest an experimental strategy that shortens the time needed to obtain protein samples for further research.

A Glossary of Terms is available in the top right corner that defines these headings.
You can read the full report by clicking on the TargetID, or you can also read individual protocols used during the production of this protein by clicking on the experimental step (ex. expression)
The full Protocols report from PepcDB is as follows:

General information, such as the TargetID, responsible center, and UniProt entry name.
Other useful information includes the CloneID, and a link to purchase the target DNA clone, available through the PSI Materials Repository.
Then, it provides derived protein information that may elucidate structure and function, as in the Targets tab.
The novel feature is the experimental summary of this target - number of trials attempted, how far the trial progressed (and if work was stopped), as well as the protocols used during the protein production process.
Since the search query can begin from a protein sequence of interest, this database will show which protocols were successful (or unsuccessful) on similar sequences.
In this way, PepcDB can be used as a tool for experimental design.
The Materials Tab
The Materials tab provides information about the availability of relevant target DNA clone materials at the PSI Materials Repository (PSI MR). The PSI MR is a resource that provides an on-line searchable database of archived PSI genetic materials, transfer, storage and maintenance of PSI plasmids in a highly quality-controlled manner at centralized on-site and off-site locations, and the facilities to distribute PSI plasmids and supporting information for research purposes within the U.S. and abroad.
From our initial search example, the PSI MR has two target clones available to order.

The information provided in this tab:
- the TargetID, with a link to the record in TargetDB
- A link to order to clone
- A link to a detailed record about the target's DNA sequence (DNA insert).
- A link to information about the DNA vector in which the target sequence resides.
Selecting one of the last three links will transfer you to the PSI-MR PlasmID website.
To see further information about this DNA clone and the vector, including antibiotic resistance for positive selection, click on the Clone Details link.

Searching the PSI SGKB using plain text
The PSI-Nature SGKB maintains a 'plain text' index of all content in webpages and documents at the PSI Center websites , PSI Technology and Publications Portal, and the Annotations Module.
To the search the PSI-Nature SGKB by plain text, enter the appropriate words in the search form, select the by Text radio button and press Search. An example query (ATP Kinase) is available by selecting the "by plain text" radio button, selecting the example query link, and pressing the Search button.

The results of the text search are presented as list of pages containing the input search term (e.g. ATP kinase) as shown below.
In the Site Search, all instances of ‘ATP Kinase' that occur on the PSI centers website are found, including powerpoint presentations and a targets summary.

Clicking on the Structural Publication tab will show all structural PSI-published articles that contain the query term; in this case, all structural publications that contain the term ATP Kinase.
These records include links to protein structures that contain the search term as well. The PubMed identifier, DOI number, and PubMed Central links to the article are provided when available, and by selecting the "Read More" link, the full citation and abstract of the article will appear.

Clicking on the Methodological Publication tab will show all PSI-published articles and reports containing the search term that focus on methodology. By selecting the "Read More" link, the full citation will be shown. In this way, you can search for new methods developed by the PSI efforts to help your own research.

Lastly, explore the site on your own.
This tutorial has walked through all of the features available that you can use towards your own research. With this "one-stop shop", you can find various sorts of assistance, from structural and annotation information about your protein, to reports and protocols about how to obtain it.
If you have any questions or comments, or would like to suggest future features for the PSI SGKB, please contact us at psi-sgkb@nature.com.
