Access to ENCODE data

Data Coordination Center (DCC)

All data generated by the ENCODE consortium is submitted to the DCC and available from the ENCODE portal (http://www.encodeproject.org). The data are reviewed for quality and released to the scientific community. The ENCODE Portal serves as the primary source of ENCODE data and information about the ENCODE Consortium. The ENCODE portal contains these features to access, view, and download ENCODE data:

  • Search metadata: The metadata describing the assays can be searched by entering any string into the text box in the upper right corner of the page or via the faceted browser. Select "Assays", "Biosamples", or "Antibodies" located below the "Data" menu in the toolbar to browse those data types. 
  • Visualize data: When there is data suitable for visualization, a "Visualize Data" button under the Files section of the assay pages launches a Genome Browser track hub for visualization.
  • Download data: All released data are publicly available available for download. Bulk downloads of the data and metadata associated with the files can be performed by programmatic access of the ENCODE REST API.

Information on how to use the portal can be found on the Getting Started help page.

Alternate providers of ENCODE data

ENCODE data is also available through other genomics portals. These resources may include a subset of the entire ENCODE corpus based on the purpose of the resource.

Visualize ENCODE data with other genomic annotations:

Search for ENCODE data along with data from other consortia:

Data repositories for ENCODE data:

  • GEO: Processed data are deposited at GEO.
  • SRA and ENA: Raw sequence data are deposited in the sequence read archives

 

Non-ENCODE data hosted at this site

In addition to the ENCODE-produced data, this site also hosts data from other projects in a variety of arrangements. When searching the data, one can use the Projects facet to choose one of: ENCODE, ROADMAP, modENCODE, modERN, or GGR. These terms can also be used in programmatic searches using award.project=NAME.

Roadmap Epigenomics Mapping Consortium

The Roadmap Epigenomics Mapping Consortium (Roadmap) was an NIH funded project comparable to ENCODE. The Roadmap metadata has been fully incorporated into this site, including the standardization of terms and ontologies. The experimental metadata can now be searched and explored alongside the ENCODE data. However, the raw data is still housed at dbGAP and the hg19 processed data is housed at GEO. Each Roadmap experiment has a link to the corresponding GEO submission.

modENCODE

The modENCODE consortium identified various elements in the genomes of C. elegans and Drosophila, including transcriptomes, transcription factor binding sites, histone-modifications and the occupancy of histone variants and histone modifying-factors. The consortium completed this work in 2012.

modERN

The model organism encyclopedia of regulatory networks (modERN) project focuses on identification of additional transcription factor binding sites in C. elegans and Drosophila. The goal of the project is to comprehensively map the binding sites for as many factors as possible. This project is ongoing.

Genomics of Gene Regulation

The Genomics of Gene Regulation project is a new NIH initiative. The GGR data will be entirely housed at this site including metadata, raw and processed data.