Data standards

Overview

The ENCODE consortium analyzes the quality of the data produced using a variety of metrics. This page describes the data standards and metrics that are used to evaluate the data and what they appear to measure. These quality metrics will be updated on occasion to include analysis of more recent data.

It is important to note that quality metrics for evaluating epigenomic assays is an area of research, so standards are emerging as more metrics are used with more datasets and types of experiments. The typical values for a quality metric can be quite different with different assays, or even comparing different features in the same assays, such as different antibodies used in ChIP-seq experiments. Currently there is no single measurement that identifies all high-quality or low-quality samples. As with quality control for other types of experiments, multiple assessments (including manual inspection of tracks) are useful because they may capture different concerns. Comparisons within an experimental method (e.g., comparing replicates to each other, or comparing values for one antibody in several cell types, or the same antibody and cell type in different labs) can help identify possible stochastic error.

Experimental guidelines

The ENCODE Consortium has adopted uniform guidelines for the most common ENCODE experiments. The guidelines have evolved over time as technologies have changed. The current guidelines are informed by results gathered during the project. Previous versions of the standards are also available for reference.

Quality metrics

The ENCODE consortium analyzes the quality of the data produced using a variety of metrics. Those generated for datasets published as part of the ENCODE integrative analysis publications in 2012 can be found on the quality metrics page associated with the publication.

Standards for ENCODE3

As part of the current phase of ENCODE, uniform pipelines are being developed for the major datatypes. With these pipelines come sets of standards for excellent, passable, and poor data quality. The following links, as they are updated, will describe the current standards for each of these pipelines:

ChIP-seq

Long RNA-seq

RAMPAGE

microRNA-seq

ATAC-seq ChIA-PET

WGBS

Small RNA-seq

eCLIP

microRNA counts

DNase-seq RNA Bind-N-Seq

Software tools

Many of the software tools used for quality metrics, along with their citations, can be found on the Software Tools page.

For questions about ENCODE quality metrics, please contact Mike Pazin, NHGRI.