Integrative Analysis of ENCODE Data

Introduction

The major goal of the ENCODE project is to identify all functional elements in the human genome sequence, where functional element is defined as a discrete region of the genome that encodes a reproducible biochemical signature. ENCODE data production groups generate data and submit the data to the ENCODE Data Coordinating Center (DCC) for quality control and release. A cross-consortium effort to perform integrated analysis of all the data types to generate useful integrative data interpretations for the community has come to completion. The results of these analyses have been published as the ENCODE integrative analysis publication package. This page describes a series of resources associated with the integrative analysis of ENCODE data.

Analysis tools

ENCODE analysis virtual machine

The supplementary information for the ENCODE integrative analysis Nature publication includes a set of code bundles that provide the scripts and processing steps corresponding to the methodology used in the analyses associated with the paper. The analysis group has established an ENCODE virtual machine instance of the software, using the code bundles, where each analysis program has been tested and run. The virtual machines are freely available for interested parties to use to work with the data and tools used in the integrative analysis.

 

Software tools

A page describing the software tools used in the ENCODE project is provided at ENCODE portal.

Data standards and quality metrics

As part of the integrative analysis, the ENCODE project has established a number of standards. Details of each set of standards is available at the following pages:

Data

Data Coordination Center resources

All ENCODE data used for these publications, like all production data generated by the ENCODE consortium, is submitted to the DCC. Data is reviewed for quality and released to the scientific community. The DCC maintains the ENCODE portal providing access to this data.

Analysis data hub

The integrative analysis process has been a distributed effort by many groups. Individual analysts downloaded and processed files from the ENCODE download site, and created intermediate and final analysis products in various forms. Now that the analysis has been completed, the analysis data is being made available for viewing and downloading through a UCSC public data hub. This data hub includes descriptions of ENCODE data in uniformly processed signal and element representations, as well as genome segmentations. The ENCODE downloads page includes an Analysis Hub section that provides access to files on the hub. Click here to visualize the ENCODE Integrative Analysis Data Hub in the UCSC Genome Browser.

Analysis FTP site

Access to the analysis products are also provided via anonymous FTP from the EBI ENCODE analysis FTP server. This site contains an organized file structure with the ENCODE analysis datasets located in subdirectories within the byDataType directory.