Getting Started

Introduction

Welcome to the ENCODE Portal! The ENCODE Portal serves as the primary source for data generated by the ENCODE Consortium and for up-to-date information about the ENCODE Project including data releases, publications, and ENCODE Portal tutorials. This site is developed and maintained by the Data Coordination Center (ENCODE DCC). All datasets generated by the ENCODE Consortium are submitted to the DCC and reviewed for quality and adherence to data standards prior to release to the scientific community. No account is needed to view released data.

This document describes what information and data are available on the ENCODE Portal, ways to get started searching and downloading data, and an overview to how the metadata describing the assays and reagents are organized. ENCODE data can be visualized and accessed from other resources, including the UCSC Genome Browser and ENSEMBL.

If you have any further questions, please contact the ENCODE DCC via email (encode-help@lists.stanford.edu) or Twitter (@encodedcc) .

Information available at the portal

The ENCODE Portal contains the following types of data generated by the ENCODE Consortium:

Additional information about the activities of the ENCODE Consortium are provided on the portal:

Using the portal

Browse | Text Search | Region Search

Programmatic access | Download files | Visualize data

Browse and filter data

Clicking the “Search” option located in the “Data” toolbar menu brings up a list of all available experiments that have been used to generate ENCODE data. These results can be narrowed and filtered by selecting one or more values in a metadata category, also referred to as a "facet," on the left hand side of the page. Multiple facet values, in the same or different categories, can be selected at any one time to generate more specific queries. To exclude a facet value, click the red icon which appears to the right of each facet value when the cursor is hovered above it.

Figure 1. Clicking “Data” or “Materials & Methods” in the toolbar will show the dropdown menus, where links to the Search, Antibodies, and Biosamples pages are located.
Figure 2. The default experiment search page. (A) The sidebar containing facets and facet values. (B) (L to R) Report, matrix, and summary buttons to change the display of the search results. Also of note is the “Download” button to the right which allows users to batch download files (see Download Files below). The “Visualize” button (see Visualize Data below) also appears to the right once facets have been selected to filter the number of results to 100 or fewer. (C) The list of search results. Because only the “released” status is selected, this list contains every released experiment. The list will be further filtered once other facet values are selected - for example, only ChIP-seq experiments will be shown if “ChIP-seq” is selected in the “Assay” facet category.

By default, the Search page shows the results in List view. Along the top of the list, there are three buttons which will display the results in a different format:

  • Report view: displays results as rows in a table, with a default selection of metadata categories as the columns. To change which categories are shown, click the “Columns” button above the table and select the desired categories in the pop-up window. To filter the results, click facet values in the sidebar.
  • Matrix view: displays results in a matrix, organized with biosamples along the y-axis and assay type along the x-axis. Facets also appear beside the x- and y-axes and function as they do in the List view. Clicking any value in the matrix will redirect to a list view search with the corresponding biosample and assay facet values selected.

  • Summary view: displays a general overview of the data in chart form based on the labs, assays, and status of the experiment.

Search pages also exist for biosamples and antibody lots. To view these, click the “Materials & Methods” toolbar menu and then select “Biosamples” and “Antibodies” respectively. These search pages function similarly to the Experiment search page.

Objects can also be filtered by their assigned status. Experiments and files with a status "released," "archived," and "revoked" are currently available to the public. Explore the status terminology page for more information on definitions.

Search for data

The website can be searched by entering a search term in the search box located in the upper right hand corner in the toolbar. This search box appears on every page. Example search terms include a biosample (e.g."skin"), an assay name (e.g. "ChIP-seq"), or a protein target of an antibody (e.g. "CTCF"). The search results can be narrowed by data type, an experiment, biosample, or antibody, and then further filtered using the metadata categories on the left hand side (refer to the "Browse and filter data" section above).

Search by region

It is possible to search for experiments by region using Region Search. It accepts coordinates and gene names among other forms of region identifiers and returns experiments for which the input region intersects regions specified in the peaks file.

Programmatic access to the portal

In addition to web-based browsing and searching, the ENCODE portal can be accessed programmatically via the REST API. Instructions on how to browse and search for ENCODE data programmatically are provided in the REST API help document. In brief, all queries that can be performed via the web can be used as programmatic queries.

Download files

Files are named by their accession and contain file format information. Links to download individual files are available beside each file accession listed in the file section of each page that describes a single assay (see Fig. 3), as well as on each file's individual page. Files can be downloaded directly from the web page or the link can be copied to be downloaded elsewhere.

Via the wget command:

 > wget https://www.encodeproject.org/files/ENCFF002CTW/@@download/ENCFF002CTW.bed.gz

Via the curl command:

 > curl -O -L https://www.encodeproject.org/files/ENCFF002CTW/@@download/ENCFF002CTW.bed.gz
Figure 3. An example experiment page. (A) By default, the experiment page displays the Association Graph in the Files section of the page. To view the file table as shown here instead, please click on the “File details” tab. (B) A link to download each file listed in the table is shown beside the accessions. Clicking the accession (e.g. ENCFF002DYE) leads to the file's individual page, which also displays a download button.

There are multiple options for downloading more than one file. While browsing through experiments (see "Browse and filter data"), a "Download" button appears near the top of the page (see Fig. 2), which brings up a batch download pop-up window with instructions on how to download the files of all experiments found with the current query. There is also a cart based method to bulk download data, which allows users to group arbitrary experiments together rather than using a query.

Another method relies on using programmatic access to obtain the direct URLs to the files from the data object representing them. Further information on how to retrieve JSON data objects can be found on the REST API help page.

Please note that if ENCODE data is used in your publication or talk, the accessions of the datasets used should be cited in the paper, along with the most recent ENCODE Consortium publication. Complete guidelines are available on the Data Use Policy page.

Visualize data

To visualize a single experiment, navigate to the experiment's page and click the "Visualize" button located on the upper right hand corner of the Files section (see Fig. 3). This will launch a Genome Browser view when there is data suitable for visualizations. There is also a "Quick View" option, which opens an embedded visualization, shown in the image below. Files must be in bigBed or bigWig file format to be visualized as a track hub.

Figure 4. The Quick View visualization of assay ENCSR265ZXX. The experiment can also be visualized using the UCSC or ENSEMBL genome browsers.

The "Visualize" button also appears on Search and Matrix view pages once filtered to a maximum of 100 experiments, provided there are released experiments within that set. By clicking this, you can open a genome browser view with track hubs for each experiment in the search results, allowing you to visualize multiple experiments simultaneously.

Data organization

Metadata

The DCC, in collaboration with the labs performing the assays and the Data Analysis Center (DAC), have defined a set of metadata that are used to help describe the experimental conditions that were used to generate the data, processing steps that were performed to analyze and interpret the data, and metrics to evaluate the quality and reproducibility of the data. These metadata are displayed on the pages that describe the assays, biosamples, and antibodies.

Accessions

ENCODE DCC creates accessions for metadata that can be reused in experimental protocols and computational analyses. This ensures that the exact assay or reagent is being referred to when assays are being discussed or files are being analyzed. The accessions are in the format ENC[SR|BS|DO|AB|LB|FF][0-9]{3}[A-Z]{3} where [SR|BS|DO|AB|LB|FF] refer to the metadata type given the accession. Accessions are given to the following types of metadata:

  • An assay: Each assay is given an accession. Typically, the replicates will be performed using the same method, performed on the same kind of biosample, and investigating the same target. Assays may contain one or more biological replicates. A sample accession for an assay is ENCSR000DVI.
  • A biosample: An accessioned biosample refers to a tube or sample of that biological material that is being used. For example, the following would all be given a biosample accession: (1) a batch of a cell line grown on a specific day, (2) the isolation of a primary cell culture on a specific day, or (3) the dissection of a tissue sample on a specific day. A sample accession for a biosample is ENCBS046RNA.
  • A strain or donor: Every strain (for model organisms) and donor (for humans) is given a donor accession. This accession allows multiple samples obtained from a single donor to be grouped together. The donor information is listed with the biosample: ENCBS046RNA.
  • An antibody lot: Each unique antibody lot is accessioned so that assays can refer specifically to that antibody. Each antibody lot is also associated with characterizations for its target in a species.
  • A library: A unique library that can be resequenced is accessioned to ensure the correct files are associated with the nucleic acid material that has been created from the biosample. The library accession and experimental details of how the library are constructed are displayed on the assay page: ENCSR000DVI.
  • A file: Each data file is accessioned.  This accession is used as the file name, along with its file format as an extension. The file accession is associated with the contents of that file. When a new file is submitted to replace an existing file, the new file is given a new accession and related to the older file. Files are displayed at the bottom of an assay page: ENCSR000DVI.

Ontologies

The DCC uses a set of controlled vocabularies and ontologies to maintain consistent use of language when describing experimental metadata. The use of consistent language facilitates the integration of datasets from diverse projects and is essential to ensure all the correct results are returned when browsing or searching the metadata. The relations between ontology terms are used by the ENCODE DCC in order to group samples and experiments by higher-level terms, which results in organ and cell type facets seen when browsing data on the portal. To this end, the ENCODE DCC is using the following ontologies to capture specific metadata categories:

Data model

The metadata captured for the experimental assays and computational analyses are organized as objects that have a defined relationship to each other. In general, the data model is organized around ensuring that the replicate structure of the assays are represented along with the reagents, like biosamples, that were used in the assay (Figure 5). In addition, the assays are associated with the raw data that are generated from the assay and the processed data from these raw data. The assay accession serves to group all related replicates together.

Figure 5. Representation of the core of the ENCODE DCC data model.

The ENCODE portal provides formatted views of each data object, known as a profile page. Profile pages for the metadata model depicted in Figure 5 include the following:

The entire data model is available at the ENCODE DCC github schema repository and visualized in svg. Replace the object name in the profile URL to view the formatted schema.