Reference Sequences

Genome References

The ENCODE project uses Reference Genomes from NCBI or UCSC to provide a consistent framework for mapping high-throughput sequencing data. In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability. Drosophia melanogaster experiments are mapped to either dm3 or dm6 and Caenorhabdilis elegans experiments are mapped to ce10 or ce11. The official reference files for the Uniform processing pipelines can be found in File Set ENCSR425FOI and File Set ENCSR884DHJ. In addition to the genome sequences (we generally use the "no alt" version for each genome), a variety of other crucial files can be found there as well (GENCODE transcript references, chromosome size files, the phage lambda genome, etc.).


Collection of References

Some of the experiments at the ENCODE portal have not been processed by the DCC uniform processing pipelines and may have used different reference files.
The References search page includes all the reference datasets used by the different projects whose data could be found on the portal.


Reference File Sets 

Datasets ENCSR425FOI and ENCSR884DHJ include the files used for uniform processing by the ENCODE DCC. For your convenience, the GRC genome assembly and GENCODE annotation files are directly linked below. For further information, please contact encode-help@lists.stanford.edu

Genome assemblies:

Reference File

Description

GRCh38_no_alt_analysis_set_GCA_000001405.15 [download]

GRCh38 XY reference genome (ENCODE3 used only one reference genome for analysis)

female.hg19 [download]

hg19 XX reference genome (ENCODE2 used sex-specific genomes for analysis)

male.hg19 [download]

hg19 XY reference genome (ENCODE2 used sex-specific genomes for analysis)

mm10_no_alt_analysis_set_ENCODE [download]

mm10 XY reference genome (ENCODE3 used only one reference genome for analysis)

female.mm10 [download]

XX mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta

male.mm10 [download]

XY mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta

 

Genome annotations:

Reference File

Description

ENCFF159KBI [download] GRCh38 GENCODE V29 merged annotations gtf file

ENCFF824ZKD [download] & ENCFF316JQJ [download]

GRCh38 GENCODE V24 gtf and tar files

gencode.v19.annotation [download]

hg19 GENCODE V19 gtf file

ENCFF871VGR [download] mm10 GENCODE VM21 merged annotations gtf file 

gencode.vM7.annotation [download]

mm10 GENCODE M7 gtf file

gencode.vM4.annotation [download]

mm10 GENCODE M4 gtf file

 

Additional useful files for uniform processing pipelines:

Reference File

Description

ENCFF908UQN [download]

spike-in sequence used for RNA-seq analysis

lambda.fa [download]

phage λ wild type assembly J02459.1 (used as a methylation negative control)

ENCFF200UUD [download]

MINT-ChIP hg19 blacklist
ENCFF023CZC [download] MINT-ChIP GRCh38 blacklist

 

Updated August 15th, 2019