Reference Sequences
Genome References
The ENCODE project uses Reference Genomes from NCBI or UCSC to provide a consistent framework for mapping high-throughput sequencing data. In general, ENCODE data are mapped consistently to 2 human (GRCH38, hg19) and 2 mouse (mm9/mm10) genomes for historical comparability. Drosophia melanogaster experiments are mapped to either dm3 or dm6 and Caenorhabdilis elegans experiments are mapped to ce10 or ce11. The official reference files for each Uniform processing pipeline can be found in the table below, organized by organism and pipeline. In addition to the genome sequences (we generally use the "no alt" version for each genome), a variety of other crucial files can be found there as well (GENCODE transcript references, chromosome size files, the phage lambda genome, etc.).
Reference File Sets
The table below includes files used by each pipeline for uniform processing by the ENCODE DCC, with associated details on genome assembly and annotation, if applicable. For your convenience, the GRC genome assembly and GENCODE annotation files are directly linked below. For further information, please contact encode-help@lists.stanford.edu
ENCODE4 Uniform Processing Pipeline Filesets
Organism |
Pipeline(s) |
Reference file set |
Genome assembly & annotation |
|
|||
GRCh38 with GENCODE version V29 |
|||
GRCh38 with GENCODE version V29 |
|||
GRCh38 |
|||
GRCh38 |
|||
GRCh38 |
|||
GRCh38 with GENCODE version V29 |
|||
GRCh38 with GENCODE version V29 |
|||
GRCh38 |
|||
|
|||
mm10 with GENCODE version M21 |
|||
mm10 with GENCODE version M21 |
|||
mm10 |
|||
mm10 |
|||
mm10 |
|||
mm10 with GENCODE version M21 |
|||
mm10 with GENCODEversion M21 |
|||
mm10 |
Genome assemblies:
Reference File |
Description |
GRCh38 XY reference genome (ENCODE3 used only one reference genome for analysis) |
|
hg19 XX reference genome (ENCODE2 used sex-specific genomes for analysis) |
|
hg19 XY reference genome (ENCODE2 used sex-specific genomes for analysis) |
|
mm10 XY reference genome (ENCODE3 used only one reference genome for analysis) |
|
XX mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta |
|
XY mm10 "minimal", which has been replaced by mm10_no_alt_analysis_set_ENCODE.fasta |
Genome annotations:
Reference File |
Description |
ENCFF159KBI [download] | GRCh38 GENCODE V29 merged annotations gtf file |
GRCh38 GENCODE V24 gtf and tar files |
|
hg19 GENCODE V19 gtf file |
|
ENCFF871VGR [download] | mm10 GENCODE VM21 merged annotations gtf file |
mm10 GENCODE M7 gtf file |
|
mm10 GENCODE M4 gtf file |
Additional useful files for uniform processing pipelines:
Reference File |
Description |
spike-in sequence used for RNA-seq analysis |
|
phage λ wild type assembly J02459.1 (used as a methylation negative control) |
|
ChIP GRCh38 blacklist | |
MINT-ChIP hg19 blacklist | |
ENCFF023CZC [download] | MINT-ChIP GRCh38 blacklist |
Collection of References
Some of the experiments at the ENCODE portal have not been processed by the DCC uniform processing pipelines and may have used different reference files. The References search page includes all the reference datasets used by the different projects whose data could be found on the portal.
Updated January 05, 2021