Versions history of G2D
G2D Home

Version 5.0 / Oct 2010

The main change in this version is that we have droped the BLAST comparisons used to transfer the Gene Ontology annotation. Instead we have chosen to trivially extend human GO annotations by transferring it from homologs using HomoloGene The reason to do that is that both annotations and genomic sequences have really improved their quality since we started G2D, nine years ago. We have fully updated all data sources.

Version 4.1 / Mar 2007

When using the KNOWN GENES option, those candidates overlapping with a gene that has a STRING interaction with the user's input known genes are now flagged.

Version 4.0 / Oct 2006

When a disease phenotype has been mapped to more than one locus it may occur that the responsible genes in the different locations may be involved in the same pathway or even directly interacting. To make use of this possibility, we have implemented a way to check whether there are known or predicted interactions between a protein in the problem locus and another one located in a second locus. The query form can be accessed here. Functional protein associations are taken from the STRING database. Priorities are stablished by using the STRING score: the higher the score, the more reliable is the interaction. For details on how this procedure performed in a benchmark you can check the supplementary information.

For an example using proteins interactions in the server see the tutorial.

Version 3.0 / Mar 2006

An alternative to the G2D association score (based on a description of the phenotype) to find genes associated to an inherited disease is to provide genes already known to produce a similar variant of the disease of interest. The expectation is that the gene you are looking for will have a similar function to those. This is a more stringent approximation than the association via MeSH C terms, but it is obviously correct in some cases. We have included this option and can be accessed here.

We have prepared a tutorial explaining the different uses of the server trough simple examples.

Version 2.3 / Feb 2006

G2D doesn't make gene predictions from the BLAST hits it uses. To date, if several proteins showed sequence homology with the same genomic regions, the corresponding BLAST hits were displayed as separated candidate genes (sorted by their respective protein's GO-scores).

We have added now the possibility of organizing the output, based on an already existing gene prediction, by comparing the BLAST hits with RefSeq genes. In this way, you can eliminate redundancy from hits overlapping with the same RefSeq gene.

For this select "collapsed" in the multiple-choice "Organize the candidates by RefSeq genes" in the OUTPUT BOX. If you want to see all the proteins that hit to the same candidate, select "verbose". Finally, if you select "none", this analysis will not be performed.

Note that overlapping hits from different proteins that don't match a RefSeq prediction will be still displayed as different candidates.

Version 2.2 / Dec 2005

Changed scoring function of the protein in a RefSeq updated set from the average of the GO terms scores to the maximum GO term score. Performance in our benchmark improved from previous version : Out of the 100 disease-related genes, 90 were identified. In those cases, the gene was among the 3 best scoring genes in the 41.1% of the cases, among the 8 best scoring genes in the 52.2% of the cases, and among the 30 best scoring genes in the 68.8%.

Version 2.1 / Nov 2005

OMIM identifiers have been updated for current MEDLINE links. Several hundreds of them have been added too to the table .

Version 2.0 / May 2005

The algorithm is now fully available through the web server. The user has to input the disease phenotype and a genomic region. The phenotype is currently defined by the OMIM identifier of the disease (or of an equivalent one); a list of weighted MeSH-C terms is produced by counting the frequency of links to MEDLINE in the OMIM entry that are annotated with each MeSH-C term. The genomic region can be defined by base position, marker, or band. The amount of candidates obtained can be restricted by varying two parameters: sheer number of candidates, and threshold of E-value of similarity of a match in the genomic region to the RefSeq database.

The results are now linked to information indicating positively or negatively the expression of the gene. Negative information is an overlap to predicted pseudogenes. Positive information is an overlap to expressed sequence tags (ESTs).

Each match in the genomic region is linked to the UCSC Genomic Browser, which allows examining the latest genomic annotation of the region including links to known or predicted genes.

The databases used by the method where updated: RefSeq, MEDLINE, human genome (Build 35, hg17). The server includes now an updated collection of pre-computed analysis of more than 550 monogenic diseases without known associated gene, and the analysis of a complex disease: candidate lists for 17 regions linked to asthma.

Version 1.1 / February 2002

Major redesign of the server before becoming public.

Entry U59333 (Hereditary Motor and Sensory Neuropathy Russe; HMSNR; OMIM: 605285) updated with new more accurate band information. 20 November 2001.

Entry U23723 (Prelingual progressive nonsyndromic hearing loss; OMIM: 606282) added. The table of unknown diseases contains 456 entries. 4 October 2001.

Version 1.0 / June 2001

Genes2Diseases was run during June 2001 over 455 human diseases extracted from the EntrezGene sequence database as mapped to a chromosomal region but with "phenotype only".

The MEDLINE was licensed to us and provided by NCBI in February 2001 (containing 10 725 796 references to scientific papers). The entries annotated with MeSH terms were analysed in order to deduce 6 023 924 pairs of relations between 6 992 MeSH C terms and 5 070 MeSH D terms. At that time, practically no entry corresponding to the year 2000 was yet annotated with MeSH terms.

A total of 10 329 annotated eukaryotic sequences were taken from RefSeq. From their annotations with GO terms linked to MEDLINE entries, we deduced relations for 98 969 pairs of the 5 070 MeSH D terms to 2 379 GO terms.

A further 27 recently (during 2000-2001) solved diseases, but for which no data on the final characterization was available in the database used, was used as benchmark.

A more extensive benchmark was carried on 100 diseases by removing all papers from the database describing the disease.

A total of 185 different scripts amounting to 16801 lines of code were written in order to run Genes2Diseases version 1.0 and its server.