Reference Sequences

Genome References

The ENCODE project uses Reference Genomes from NCBI or UCSC to provide a consistent framework for mapping high-throughput sequencing data. In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability. Drosophia melanogaster experiments are mapped to either dm3 or dm6 and Caenorhabdilis elegans experiments are mapped to ce10 or ce11. The official reference files for each Uniform processing pipeline can be found in the table below, organized by organism and pipeline. In addition to the genome sequences (we generally use the "no alt" version for each genome), a variety of other crucial files can be found there as well (GENCODE transcript references, chromosome size files, the phage lambda genome, etc.).


Reference File Sets 

The table below includes files used by each pipeline for uniform processing by the ENCODE DCC, with associated details on genome assembly and annotation, if applicable.  For your convenience, the GRC genome assembly and GENCODE annotation files are directly linked below. For further information, please contact encode-help@lists.stanford.edu

ENCODE4 Uniform Processing Pipeline Filesets

Organism

Pipeline(s)

Reference file set

Genome assembly & annotation

hexagonal human thumbnail

ATAC-seq

ENCSR938RZZ

GRCh38 with GENCODE version V29

Bulk RNA-seq

ENCSR151GDH

GRCh38 with GENCODE version V29

ChIP-seq

ENCSR174MOP (TF, Histone)

GRCh38

ENCSR487FYI (MINT-ChIP)

GRCh38

DNase-seq

ENCSR083JOX

GRCh38

Long read RNA-seq

ENCSR925QOG

GRCh38 with GENCODE version V29

microRNA-seq

ENCSR608ULQ

GRCh38 with GENCODE version V29

WGBS

ENCSR475CAG

GRCh38

hexagonal mouse thumbnail

ATAC-seq

ENCSR535HHO

mm10 with GENCODE version M21

Bulk RNA-seq

ENCSR496QMW

mm10 with GENCODE version M21

ChIP-seq

ENCSR928IAF (TF, Histone)

mm10

ENCSR879BNV (MINT-ChIP)

mm10

DNase-seq

ENCSR029SEE

mm10

Long read RNA-seq

ENCSR069LTV

mm10 with GENCODE version M21

microRNA-seq

ENCSR536KRM

mm10 with GENCODEversion M21

WGBS

ENCSR923XUC

mm10

 

Genome assemblies:

Reference File

Description

GRCh38_no_alt_analysis_set_GCA_000001405.15 [download]

GRCh38 XY reference genome (ENCODE3 used only one reference genome for analysis)

female.hg19 [download]

hg19 XX reference genome (ENCODE2 used sex-specific genomes for analysis)

male.hg19 [download]

hg19 XY reference genome (ENCODE2 used sex-specific genomes for analysis)

mm10_no_alt_analysis_set_ENCODE [download]

mm10 XY reference genome (ENCODE3 used only one reference genome for analysis)

female.mm10 [download]

XX mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta

male.mm10 [download]

XY mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta

 

Genome annotations:

Reference File

Description

ENCFF159KBI [download] GRCh38 GENCODE V29 merged annotations gtf file

ENCFF824ZKD [download] & ENCFF316JQJ [download]

GRCh38 GENCODE V24 gtf and tar files

gencode.v19.annotation [download]

hg19 GENCODE V19 gtf file

ENCFF871VGR [download] mm10 GENCODE VM21 merged annotations gtf file 

gencode.vM7.annotation [download]

mm10 GENCODE M7 gtf file

gencode.vM4.annotation [download]

mm10 GENCODE M4 gtf file

 

Additional useful files for uniform processing pipelines:

Reference File

Description

ENCFF908UQN [download]

spike-in sequence used for RNA-seq analysis

lambda.fa [download]

phage λ wild type assembly J02459.1 (used as a methylation negative control)

ENCFF356LFX [download]

ChIP GRCh38 blacklist

ENCFF200UUD [download]

MINT-ChIP hg19 blacklist
ENCFF023CZC [download] MINT-ChIP GRCh38 blacklist

Collection of References

Some of the experiments at the ENCODE portal have not been processed by the DCC uniform processing pipelines and may have used different reference files. The References search page includes all the reference datasets used by the different projects whose data could be found on the portal.

Updated January 05, 2021