Batch Download

Overview

Batch download of files is available from a search result. When a set of experiments have been selected, click on the "Download" button to download a files.txt file that contains a list of URLs. The first URL in files.txt is to metadata.tsv, a file described below that contains all the experimental metadata for the files resulting from the search. The remaining URLs in files.txt are links that will download each ENCFF accessioned file.

Files

The "files.txt" file can be copied to any server. The following command can be used to download all the files in the list:

xargs -n 1 curl -O -L < files.txt

Please be prepared to receive several gigabytes of data if your search includes many files.

Metadata

The first line in the files.txt file will be a link to a file (named metadata.tsv) that contains metadata describing the assay and the files. The metadata.tsv file includes the following columns:

  • File accession: The unique accession for the file. Each files is assigned an accession (in the format of ENCFF[0-9]{3}[A-Z]{3}), which can be used when searching the ENCODE Portal to identify its associated assay.
  • File format: The file format of the file downloaded. The ENCODE Consortium uses a set of common file formats to ensure consistent representation of similar data types. Additional information about the file formats used in the ENCODE Consortium are described in the File Format help doc.
  • Output type: The output type provides additional information about the expected contents in the file.
  • Experiment accession: The unique accession of the experiment, which represents the assay performed on two biological replicates of the same biosample.
  • Assay: The name of the assay performed.
  • Biosample term id: The unique identifier of the ontology terms used to describe the biosample. The EFO prefix refers to the Experimental Factor Ontology; the CL prefix refers to the Cell Ontology; the UBERON prefix refers to the Uber anatomy ontology
  • Biosample term name: The human readable ontology name used to describe the biosample.
  • Biosample type: A categorization of biosamples into major groups. Includes one of the following: whole organism, tissue, primary cell, immortalized cell line, in vitro differentiated cell, induced pluripotent stem cell, stem cell.
  • Biosample life stage: A categorization of the age of the bisoample being assayed.
  • Biosample sex: The biological sex of the biosample.
  • Biosample organism: The species of the biosample.
  • Biosample treatments: The name of the chemical or biological agent applied to a biosample in order to elicit a response.
  • Biosample subcellular fraction term name: The subcellular fraction of the biosample being assayed if the biosample has been fractionated.
  • Biosample phase: The cell cycle phase of the biosample being assayed if the biosample has been sorted.
  • Biosample synchronization stage: Stage at which organism was synchronized. Used in conjunction with time and time units post-synchronization.
  • Experiment target: The name of the gene product being investigated. The value in this column could be the target of an antibody in a ChIP-seq assay or the target of an shRNA in an shRNA knockdown assay.
  • Antibody accession: The accession of the antibody used in the assay. Searching the antibody accession in the ENCODE Portal will identify the related antibody lot and performed antibody characterizations. Additional information can be found about the antibody characterizations available at the ENCODE Portal.
  • Library made from: The nucleic acid extracted from the biosample that is used to make the library. Typically DNA, RNA, or a specific type of RNA.
  • Library depleted in: Any nucleic acid, such as ribosomal RNA, that is removed before the library is made.
  • Library extraction method: The experimental method or kit used to extract the nucleic acid used to generate the library.
  • Library lysis method: The experimental method or kit used to lyse the biosample in order to extract the nucleic acid used to generate the library.
  • Library crosslinking method: The experimental method or kit used to crosslink the protein-nucleic acid complexes during the assay.
  • Experiment date released: The date the experiment was released to the public by the ENCODE Consortium.
  • Project: The name of the consortia or project that generated the data.
  • RBNS protein concentration: For use only with RNA Bind-n-Seq replicates to indicate the protein concentration assayed. All concentrations are currently nM, expect for the first replicate of ENCSR329RIP, which is pM.
  • Library fragmentation method: The experimental method or kit used to fragment the nucleic acid used to generate the library.
  • Library size range: The size of the library, inclusive of the insert and adaptors.
  • Biosample Age: The age of the biosample being assayed.
  • Biological replicate: The biological replicate number of the experiment.
  • Technical replicate: The technical replicate of the biological replicate number of the experiment, as defined by the lab performing the assay.
  • Read length: The read length of the sequencing.
  • Run type: A description of whether the sequencing was performed as single-ended or paired-ended.
  • Paired end: Which pair (1 or 2) the file belongs to (if paired end library).
  • Paired with: The file that corresponds with the pair of this file.
  • Derived from: The files participating as inputs into software to produce this output file.
  • Size: Size of the file in bytes.
  • Lab: The name of the PI of the lab who generated the data.
  • md5sum: The md5sum of the file, which serves as unique identifier of the content at the time of compression (if applicable).
  • File download URL: The URL where to download the file.
  • Assembly: The name of the reference genome assembly used in the processing of the file.
  • Platform: The name of the machine used to generate the data.