The fact that cluster members have similarity over their whole length makes it likely that they are evolutionarily related and may have similar properties and functions.
The tabs at the top of the interface indicate the various searches that may be performed:
We'll start by looking at searching by Sequence using BLAST.
When we search by sequence we are searching by the cluster leaders, which significantly speeds up the search as compared to searching by all cluster members. After finding clusters that are similar to the search, the various properties of those clusters can be explored using the bluster interface.
Select the sequence below and paste it into the BLAST search window. We'll leave the other search parameters at their defaults, and set the E threshold to 0.001. We'll leave the other blast search parameters at their default values.
>gi|28634|emb|CAA32891.1| crystallin [Homo sapiens] MDVTIQHPWFKRTLGPFYPSRLFDQFFGEGLFEYDLLPFLSSTISPYYRQSLFRTVLDSGISEVRSDRDK FVIFLDVKHFSPEDLTVKVQDDFVEIHGKHNERQ
Your form should now look like this:
Click on "Submit Query" to initiate the search. If the server is not under high load, the search should complete in 30-60 seconds.
Below are excerpted the first few leaders in the result set:
The "Download Original BLAST Report" link provides the output NCBI BLAST software. The "Download FASTA" link provides a FASTA file of all the matching leaders in the result set.
The table is derived from the NCBI BLAST report, with some annotations added. The columns are:
The cluster view consists of a few sections, the image above shows the following:
The final section of the cluster view is a display of the taxonomy tree for each identified protein (proteins of unknown taxonomy are omitted from this view). Below is an indicative portion of the tree diagram.
The diagram is a tree based on the full taxonomic identification for all proteins in the cluster, with the tree branching where the taxonomies associated with the members differ. The bracketed number next to each taxonomic node indicates the number of proteins below that taxonomic node i.e. Bilateria [8]. There can be multiple proteins in a cluster associated with a single taxonomic type. The lowest common node in the taxonomy tree is highlighted, in this case "Bilateria". Because the proteins in the clusters are generally evolutionarily related, this can be expected to give some insight into the origins and divergence of this protein cluster.
As an example, we'll search for proteins that are common to mammals but not all vertebrates, are associated with neural development, have domain associations and have not been studied in great detail.
To do this we will search for clusters that:
A search form filled out as described above is shown below. Please fill out the form accordingly and click "Submit Query". The search will take between 30 and 60 seconds.
Note that in Bluster Annotation searches, a protein is considered to have a literature association if the associated literature has 12 or fewer total associations. Literature with more associations tend to be broader studies, often EST or genomic and are thus uniformative about the specific protein with which they are associated.
When the search is completed, a list of clusters meeting the search criteria is returned. In this case, we find 15 clusters that meet the criteria: below we show the first result.
The results are sortable by the Leader or the number of members in the cluster. Note that for PubMed IDs there are two columns: those that refer to all PubMed references and those for references that refer to less than 13 proteins, as described above. Clicking on the leader will bring you to the cluster view, similar to that described previously in the sequence search section.