Bluster is an application for the exploration of protein sequences grouped by similarity over their full length, as described in the publication:

The clustering process creates groups of proteins that are similar over their full length. Clusters can contain just one protein or many and are identified by the ID of one arbitrarily chosen member called the "leader".

The fact that cluster members have similarity over their whole length makes it likely that they are evolutionarily related and may have similar properties and functions.

Using Bluster

When you go to the Bluster start page you will see a display like this:

The tabs at the top of the interface indicate the various searches that may be performed:

We'll start by looking at searching by Sequence using BLAST.

Sequence Searching

Click on the Sequence tab to search by sequence.

When we search by sequence we are searching by the cluster leaders, which significantly speeds up the search as compared to searching by all cluster members. After finding clusters that are similar to the search, the various properties of those clusters can be explored using the bluster interface.

Select the sequence below and paste it into the BLAST search window. We'll leave the other search parameters at their defaults, and set the E threshold to 0.001. We'll leave the other blast search parameters at their default values.

>gi|28634|emb|CAA32891.1| crystallin [Homo sapiens]
MDVTIQHPWFKRTLGPFYPSRLFDQFFGEGLFEYDLLPFLSSTISPYYRQSLFRTVLDSGISEVRSDRDK
FVIFLDVKHFSPEDLTVKVQDDFVEIHGKHNERQ

Your form should now look like this:

Click on "Submit Query" to initiate the search. If the server is not under high load, the search should complete in 30-60 seconds.

Below are excerpted the first few leaders in the result set:

The "Download Original BLAST Report" link provides the output NCBI BLAST software. The "Download FASTA" link provides a FASTA file of all the matching leaders in the result set.

The table is derived from the NCBI BLAST report, with some annotations added. The columns are:

The table can be sorted by columns with an arrow in the header by clicking on the column header.

If we click the Q6EWI1 cluster link we get can then view the details of the cluster with Q6EWI1 as a leader.

The cluster view consists of a few sections, the image above shows the following:

Following that section, available annotations for all of the individual members of this cluster are listed, and the table is sortable by ID, Name and Taxonomy. The beginning of the table is shown below.

The final section of the cluster view is a display of the taxonomy tree for each identified protein (proteins of unknown taxonomy are omitted from this view). Below is an indicative portion of the tree diagram.

The diagram is a tree based on the full taxonomic identification for all proteins in the cluster, with the tree branching where the taxonomies associated with the members differ. The bracketed number next to each taxonomic node indicates the number of proteins below that taxonomic node i.e. Bilateria [8]. There can be multiple proteins in a cluster associated with a single taxonomic type. The lowest common node in the taxonomy tree is highlighted, in this case "Bilateria". Because the proteins in the clusters are generally evolutionarily related, this can be expected to give some insight into the origins and divergence of this protein cluster.

Searching by Annotation

Clicking on the "Annotations" tab allows you to search by both the presence and absence of annotations. This can be useful for finding proteins of interest that may not be well studied.

As an example, we'll search for proteins that are common to mammals but not all vertebrates, are associated with neural development, have domain associations and have not been studied in great detail.

To do this we will search for clusters that:

A search form filled out as described above is shown below. Please fill out the form accordingly and click "Submit Query". The search will take between 30 and 60 seconds.

Note that in Bluster Annotation searches, a protein is considered to have a literature association if the associated literature has 12 or fewer total associations. Literature with more associations tend to be broader studies, often EST or genomic and are thus uniformative about the specific protein with which they are associated.

When the search is completed, a list of clusters meeting the search criteria is returned. In this case, we find 15 clusters that meet the criteria: below we show the first result.

The results are sortable by the Leader or the number of members in the cluster. Note that for PubMed IDs there are two columns: those that refer to all PubMed references and those for references that refer to less than 13 proteins, as described above. Clicking on the leader will bring you to the cluster view, similar to that described previously in the sequence search section.

Membership Search

This is the simplest of searches which allows you to determine which cluster a given protein is a member of. Entering the protein ID in the entry field will bring you to the appropriate cluster view page, similar to that described previously in the sequence search section.