| ||||||||||||||||||||||||||||||||||||||||||
|
We have updated our web server. This is the old tutorial. You can still use it to resolve some questions but there are differences. An updated tutorial will be available soon.
This page explains how to use the G2D web server with some simple
examples.
Use PHENOTYPEBy entering in the appropiate boxes, the OMIM (MIM) number and a chromosomal region you can perform a customized analysis for the disease of your choice. You may want to do this in case the disorder you are interested in is not listed among the precomputed analyses, or it is but you want to try a different linkage region.
Suppose that you are interested in candidate genes associated to Hirschsprung disease in 22q13.2.
Next, you have to enter the chromosomal region in the LOCATION BOX. You may introduce the chromosomal region by means of three different formats: base positions (e.g. 35000000 45000000 ), cytogenetic markers (like D9S201 D9S298), or cytogenetic bands (e.g.q22). Type the region and select both chomosome and the format you choose (Bands(s), Marker(s) or Positions). Here we type "q13.2" and we select accordingly "Bands" and "22" (see figure 2). If you type a single marker, the system will consider a region of 5 Mb around it. Currently, there is an upper limit of 50 Mb in the size of the chromosomal region. If your target region is bigger you will have to split it into several queries.
Once you have introduced the phenotype and the region, you may change some output options in the OUTPUT BOX. You can modify the Evalue threshold for the BLAST searches, the maximum number of candidates to be displayed, and whether a posterior gene candidate filtering based on RefSeq genes will be performed or not.
The default options ("1e-10" and "collapsed") are recommended (see figure 3). The E-value sets the similarity threshold for a homolog protein that hits the region to be considered. Regarding the organization of the candidates, G2D doesn't make gene predictions from the BLAST hits it uses. To date, if several proteins showed sequence homology to the same genomic regions, the corresponding BLAST hits were displayed as separated candidate genes (sorted by their respective protein's GO-scores). Now, we have added the possibility of removing redundancy from the output using an already existing gene prediction. This is done by comparing the BLAST hits with RefSeq genes. If several proteins hit the same RefSeq gene only the best scoring protein hits will be shown as a candidate. To use this option, select "collapsed" (default) in the multiple-choice "Organize the candidates by RefSeq genes" in the OUTPUT BOX. If you want to see all the proteins that hit to the same candidate, select "verbose". Finally, if you select "none", this analysis will not be performed (i.e. the output will be as in previous versions of G2D).
The analysis includes some heavy computation and our server holds also other services, so for quicker results restrict your query by limiting the number of RefSeq proteins taken into account. Try also to focus the search to narrow regions, but be aware that extending the band to 6 MB at least from the flanking markers is highly advisable. Finally, hit the buttom "Find candidate genes" (figure 4).
Use KNOWN GENES
An alternative way to find genes associated to an inherited disease is to
provide genes
already known to produce a similar variant of the disease of interest.
The expectation is
that the gene you are looking for will have a similar function to those.
Suppose that you are interested in candidate genes associated to Hirschsprung disease in 22q13.2. Hirschsprung disease is suspected to be a multigenic disease and it is known that mutations in the endothelin-B receptor gene (ETRB) cause one of the variants of this disease. To find similar, functionally related genes to ETRB in 22q13.2, first, you have to enter the chromosomal region in the LOCATION BOX. You may introduce the chromosomal region by means of three different formats: base positions (e.g 35000000 45000000), cytogenetic markers (like D9S201 D9S298), or cytogenetic bands (e.g. q22). Type the region and select both chromosome and the format you choose (Bands(s), Marker(s) or Positions). Here we type "q13.2" and we select accordingly "Bands" and "22" (see figure 5). If you type a single marker, a region of 5 Mb around it will be considered. Currently, there is an upper limit of 50 Mb in the size of the chromosomal region. If your target region is bigger you will have to split it into several queries.
Next, in the KNOWN GENES BOX, enter the Entrez Gene identifier of one or more genes that are known or suspected to be associated with your phenotype of interest. You may introduce either human or mouse genes. We input here the Entrez gene identifiers of mouse and human genes corresponding to ETRB (see figure 6).
Once you have introduced the phenotype and the region, you may change some output options in the OUTPUT BOX. You can modify the E-value threshold for the BLAST searches, the maximum number of candidates to be displayed , and whether a posterior gene candidate filtering based on RefSeq genes will be performed or not.
The default options ("1e-10" and "collapsed") are recommended (see figure 7). The E-value sets the similarity threshold for a homolog protein that hits the region to be considered. Regarding the organization of the candidates, G2D doesn't make gene predictions from the BLAST hits it uses. To date, if several proteins showed sequence homology to the same genomic regions, the corresponding BLAST hits were displayed as separated candidate genes (sorted by their respective protein's GO-scores). Now, we have added the possibility of removing redundancy from the output using an already existing gene prediction. This is done by comparing the BLAST hits with RefSeq genes. If several proteins hit the same RefSeq gene only the best scoring protein hits will be shown as a candidate. To use this option, select "collapsed" (default) in the multiple-choice "Organize the candidates by RefSeq genes" in the OUTPUT BOX. If you want to see all the proteins that hit to the same candidate, select "verbose". Finally, if you select "none", this analysis will not be performed (i.e. the output will be as in previous versions of G2D).
The analysis includes some heavy computation and our server holds also other services, so for quicker results restrict your query by limiting the number of RefSeq proteins taken into account. Try also to focus the search to narrow regions, but be aware that extending the band to 6 MB at least from the flanking markers is highly advisable. Finally, hit the buttom "Find candidate genes" (figure 8).
Use INTERACTIONSWhen a disease phenotype has been mapped to more than one locus it may occur that the responsible genes in the different locations may be involved in the same pathway or even directly interacting. To make use of this possibility, we have implemented a way to check whether there are known or predicted interactions between a protein in the problem locus and another one located in a second locus. We use the set of human protein-protein interactions from the STRING database. As an example of use, we take one of the diseases from our benchmark, the Exudative Vitreoretinopathy (OMIM 133780). This disease has been linked to several genes, and two of those are the LRP5 low density lipoprotein receptor related protein 5 (Entrez Gene 4041) and the FZD4 frizzled homolog 4 (Entrez Gene 8322). Those genes are located on chromosome 11 on loci q13 and q14, respectively.
Suppose that you are interested on candidates genes for this phenotype on 11q13 and that you know that the disease has also been mapped to 11q14. First, you have to enter the chromosomal region where you are looking for candidates in the LOCATION BOX. You may introduce the chromosomal region by means of three different formats: base positions (e.g 35000000 45000000), cytogenetic markers (like D9S201 D9S298), or cytogenetic bands (e.g. q22). Type the region and select both chromosome and the format you choose (Bands(s), Marker(s) or Positions). Here we type "q13" and we select accordingly "Bands" and "11" (see figure 9). If you type a single marker, a region of 5 Mb around it will be considered. Currently, there is an upper limit of 50 Mb in the size of the chromosomal region. If your target region is bigger you will have to split it into several queries.
Select the maximum number of candidates you want to be displayed and hit the "Find candidate genes" buttom (figure 10).
In the next screen, those genes from the LOCATION BOX locus that are making interactions with any protein on the SECOND LOCUS BOX will be shown as candidates. Candidates are sorted by the STRING score of the corresponding interactions.The higher the STRING score, the more reliable is the interaction. In our example, several genes from 11q13 are making interactions with genes on 11q14 with very high STRING scores, the first two candidates are making interactions with FZD4, and one of them is actually LRP5 (figure 11).
By clicking on [STRING] next to the interaction partners of a candidate you will be referred to the corresponding STRING database entry where you can check for the evidence associated to the interaction (figure 12).
Description of a candidateAt the top of the results page you will find displayed the details of your query (figures 13 and 14). Next, a box with the "reasons" the system used to derive the results from your query. In the case of using the PHENOTYPE query, it consists of the associations between the MeSH C terms from the papers associated to the phenotype in OMIM and GO terms. Follow the hyperlinks on the arrows to explore the associations (figure 13).
In the case of the KNOWN GENES query the reasons will consist of a list of scored GO terms that indicates the scoring a candidate will receive in case is annotated with a particular GO term or any of its descendents (figure 14).
Candidates are presented ordered by their susceptibility to be the causing genes according to the GO-score (figure 15). For details on how this is computed refer to here. The higher the GO-score, the better is supposed to be the candidate. A second score value is next to the GO-score, the R-score or relative score. It is ranking minus one divided by the total number of proteins in the used RefSeq set. The closer the R-score is to zero, the more interesting looks the candidate "according to current knowledge", higher values of R-score correspond to a less expected relation between candidate gene and phenotype.
|