YEAST_DB schema
---------------

- Now supported in AnnotationDbi (except for the CHRLENGTHS and REJECTORF
  maps that are missing).

- Am I right in assuming that the YEAST package is _not_ probe-based
  despite the fact that the "quality control" says:
    Mappings found for probe based rda files:
          ...
    Mappings found for non-probe based rda files:
          ...
  My understanding is that the YEAST maps are based on the
  "systematic gene names". Also misleading is the naming of the reverse maps:
  YEASTENZYME2PROBE, YEASTGO2PROBE, etc... Only 1 reverse map seems to be
  aknowledging the fact that the package data is based on the systematic gene
  names, the COMMON2SYSTEMATIC map.

NL: Correct. YEAST is not probe-based, and mose maps are indexed by systematic name.

- However, this map name "COMMON2SYSTEMATIC" is inconsistent.
  It seems to be the reverse map for the GENENAME map.
  In this case, why isn't it called GENENAME2SYSTEMATIC?
  (all other reverse maps use this naming convention of prefixing
  with the name of the direct map).

NL: Yes, GENENAME2SYSTEMATIC gives a better discription of the map. Whoever names it "COMMON2SYSTEMATIC" must have a reason. Just to clarify, "COMMON2SYSTEMATIC" is NOT the reverse map for the GENENAME map. GENENAME map contains systematic-to-genename mapping of all systematic names that has a gene name. If a gene name has no systematic name, it is not included in the GENENAME map. COMMON2SYSTEMATIC map contains all existing gene names, and provides their systematic names if there exists. COMMON2SYSTEMATIC map also contains all alias (alias of gene names), and provides their systematic names if exists.

- Where should I extract the GENENAME map from?
    o or from the sgd table? (has cols: id, systematic_name, gene_name, sgd_id)
    o or from the gene2systematic table? (has cols: gene_name, systematic_name)
  Sometime it gives the same, sometimes not. With YEAST.sqlite:
    sqlite> select gene_name from sgd where systematic_name='YAL012W';
    CYS3
    sqlite> select gene_name from gene2systematic where systematic_name='YAL012W';
    CYS3
    CYI1
    FUN35
    STR1
  Using the former seems to give results much closer to the current YEAST
  package (1.15.2) so this is what I've choosen for the current version of
  AnnotationDbi (0.0.24). But then the gene2systematic is useless...

NL:  
There are 2 naming systems for yeast: Systematic name and Gene name(or Standard Name). Detailed descriptions can be found at http://www.yeastgenome.org/help/yeastGeneNomenclature.shtml

There are 4 possible scopes of data coverages: 
  * Named Genes Only: 
contain information only about features which have been given a Gene Name, either a Standard Name or a Reserved Name. Thus these files will NOT include information on ORFs (protein coding genes) that have not been given Gene Names, and WILL include information about genetic loci that have never been mapped to a chromosomal position, but which have been given Gene Names.
  * ORFs: 
contain information about all ORFs (protein coding genes), regardless of whether or not they are also associated with a Gene Name (i.e. a Standard Name or a Reserved Name). 
  * Gene products:
contain information about chromosomal features which correspond to gene products, either protein or RNA products, including ORF (protein coding genes), Ty ORF, tRNA, rRNA, snRNA, snoRNA, and other RNA gene features. Other sequence features (LTR, ARS, Transposon, pseudogene, and CEN) will not be included.
  * All chromosomal features:
contain information about all chromosomal sequence features including ORF (protein coding genes), LTR, tRNA, Ty ORF, snoRNA, ARS, Transposon, pseudogene, rRNA, CEN, RNA gene, and snRNA features.

The scope of the following objects in the envir-based YEAST package are "ORFs": CHR, CHRLOC, DESCRIPTION, ENZYME, GO, PATH, PMID, INTERPRO, PFAM, SMART, ENZYME2PROBE, PATH2PROBE, PMID2PROBE, ALIAS, GENENAME

The scope of COMMON2SYSTEMATIC in envir-based YEAST package is "Named Genes Only" plus all alias

In the DB schema:
    table                scope            corresponding maps
sgd: "Named Genes Only" UNION "All chromosomal features";   GENENAME
chromosome_features: "All chromosomal features";   CHR, CHRLOC, DESCRIPTION
gene2alias: "Named Genes Only";   ALIAS
genes2systematic: "Named Genes Only" UNION "All chromosomal features" UNION alias;   COMMON2SYSTEMATIC
go_bp, go_mf, go_cc: "Gene Products";   GO
pubmed: "All chromosomal features";  PMID  
interpro, pfam, smart: "ORFs";  INTERPRO, PFAM, SMART