Version 1.0 / June 2001
Genes2Diseases was run during June 2001
over 455 human diseases extracted from the EntrezGene
sequence database as mapped to a chromosomal region but with "phenotype only".
The MEDLINE
was licensed to us and provided by NCBI in February 2001 (containing 10 725 796
references to scientific papers).
The entries annotated with MeSH terms were analysed in order to
deduce 6 023 924 pairs of relations between
6 992 MeSH C terms and 5 070 MeSH D terms. At that time, practically no
entry corresponding to the year 2000 was yet annotated with MeSH terms.
A total of 10 329 annotated eukaryotic sequences were taken from RefSeq.
From their annotations with GO terms linked to
MEDLINE entries, we deduced relations for 98 969 pairs
of the 5 070 MeSH D terms to 2 379 GO terms.
A further 27 recently (during 2000-2001) solved diseases, but for which
no data on the final characterization was available in the database used,
was used as benchmark.
A more extensive benchmark was carried on 100 diseases by removing
all papers from the database describing the disease.
A total of 185 different scripts amounting to 16801 lines of code
were written in order to run Genes2Diseases version 1.0 and
its server.