Data standards

Assays

The ENCODE Consortium uses several different epigenomic assays, and has also developed protocols and guidelines for the assays shown below. The links provide an overview of the assay, information on experimental guidelines, requirements for processing on the uniform pipelines, and the application of quality metrics for each assay type.

DNA binding

ChIP-seq for Histones
ChIP-seq for Transcription Factors
Previous ENCODE3 standards pages can be found at:
- ChIP-seq for Histones
- ChIP-seq for Transcription Factors

DNA accessibility

ATAC-seq
- Previous ENCODE3 pipeline: ATAC-seq
DNase-seq
Previous ENCODE3 standards pages can be found at:
- DNase-seq (also used for genetic modification DNase-seq)

DNA methylation

WGBS (ENCODE 4)
- WGBS

3D chromatin structure

ChIA-PET
HiC

Transcription

bulk RNA-seq (used for total RNA-seq, poly(A)+ RNA-seq, poly(A)- RNA-seq, CRISPR RNA-seq, CRISPRi RNA-seq, shRNA knockdown RNA-seq, and siRNA knockdown RNA-seq)
- Previous ENCODE3 standards can be found at bulk RNA-seq
long read RNA-seq
microRNA counts
microRNA-seq (ENCODE 4)
RAMPAGE and CAGE
small RNA-seq

RNA binding

eCLIP
RNA Bind-N-Seq

Experimental guidelines

The ENCODE Consortium has adopted shared experimental guidelines for the most common ENCODE assays. The guidelines have evolved over time as technologies have changed, and current guidelines are informed by results gathered during the project. The ENCODE Consortium has also developed a set of antibody characterization standards to address the problems of specificity and reproducibility that are characteristic of antibody-based assays. Previous versions of all guidelines are archived and available for reference.

Quality metrics

The ENCODE Consortium analyzes the quality of the data produced using a variety of metrics. Quality metrics for evaluating epigenomic assays are an active area of research; standards are emerging as more metrics are used with more datasets and types of experiments. The typical values for a quality metric can vary among different assays, or even among different features within the same assay, such as antibodies used in ChIP-seq experiments. Currently, there is no single measurement that identifies all high-quality or low-quality samples. As with quality control for other types of experiments, multiple assessments (including manual inspection of tracks) are useful because differing assessments may capture different concerns. Comparisons within an experimental method (e.g. comparing replicates to each other, comparing values for one antibody in several cell types, or comparing the same antibody and cell type between different labs) can help identify possible stochastic error.

As part of the third phase of ENCODE, uniform analysis pipelines were developed for the major assay types, each of which produces a set of data quality metrics. Many of the software tools used for quality metric calculations can be found on the Software Tools page with their citations, while the Terms and Definitions page contains information on individual metrics. The ENCODE Consortium uses these measures to set standards detailing the criteria for excellent, passable, and poor data. On the ENCODE portal, data that do not meet the minimum cutoff values are flagged according to severity of the error; examples of errors include low read depth, poor replicate concordance, or low correlation.

Older standards for datasets published as part of the ENCODE integrative analysis publications in 2012 can be found on the quality metrics page associated with the publication.

Click through for information on reference sequences for the Uniform Processing Pipelines.

Updated April 2019