Reference Sequences

Genome References

The ENCODE project uses Reference Genomes from NCBI or UCSC to provide a consistent framework for mapping high-throughput sequencing data. In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability. Drosophia melanogaster experiments are mapped to either dm3 or dm6 and Caenorhabdilis elegans experiments are mapped to ce10 or ce11. The official reference files for the Uniform processing pipielines can be found in File Set ENCSR425FOI and File Set ENCSR884DHJ. In addition to the genome sequences (we generally use the "no alt" version for each genome), a variety of other crucial files can be found there as well (GENCODE transcript references, chromosome size files, the phage lambda genome, etc.).


Collection of References

Many experiments at the ENCODE portal have not been processed by the DCC and may have used different references.
The References search page includes every released reference file used by all projects whose data is collected by the ENCODE DCC.


Reference File Sets ENCSR425FOI & ENCSR884DHJ

File sets ENCSR425FOI and ENCSR884DHJ include all files used by the ENCODE DCC to run its uniform pipelines. For your convenience, the fasta and GENCODE transcipt files are directly linked below. For further information, please contact encode-help@lists.stanford.edu

Reference File

Purpose

GRCh38_no_alt_analysis_set_GCA_000001405.15 [download]

GRCh38 XY reference genome (ENCODE3 used only one reference genome for analysis)

ENCFF824ZKD [download] & ENCFF316JQJ [download]

GRCh38 GENCODE V24 gtf and tar files

female.hg19 [download]

hg19 XX reference genome (ENCODE2 used sex-specific genomes for analysis)

male.hg19 [download]

hg19 XY reference genome (ENCODE2 used sex-specific genomes for analysis)

gencode.v19.annotation [download]

hg19 GENCODE V19 gtf file

ENCFF908UQN [download]

spike-in sequence used for RNA-seq analysis

lambda.fa [download]

phage λ wild type assembly J02459.1 (used as a methylation negative control)

mm10_no_alt_analysis_set_ENCODE [download]

mm10 XY reference genome (ENCODE3 used only one reference genome for analysis)

gencode.vM7.annotation [download]

mm10 GENCODE M7 gtf file

gencode.vM4.annotation [download]

mm10 GENCODE M4 gtf file

female.mm10 [download]

XX mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta

male.mm10 [download]

XY mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta

 

 

Updated September 12th, 2016.