Glossary
General terms | File output types
General terms
functional characterization data
Data generated by assays (i.e STARR-seq, MPRA, and CRISPR screen) investigating the relationship between DNA sequences and their regulatory activities.
functional genomics data
Data generated by assays investigating processes such as transcription, translation and epigenetic regulation on a genome-wide scale. Examples of assays generating functional genomics data: RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, WGBS, HiC, and ChIA-PET.
File output types
alignments
The mapping locations of input reads with respect to a genome or other provided reference.
File formats: bam
conservative IDR thresholded peaks
In replicated experiments, the set of reproducible peaks that pass an IDR threshold from two replicates.
File formats: bed, bigBed
Additional information: Transcription Factor ChIP-seqATAC-seq
enrichment
Elements or regions that appear at a statistically elevated rate compared to the control or baseline.
File formats: tsv, csv, bed
Additional information: RNA Bind-N-Seq
exclusion list regions
A comprehensive set of regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.
File formats: bed
FDR cut rate
Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity at different false discovery rates.
File formats: bed, bigBed
fold change over control
Nucleotide resolution signal coverage track, fold-over control expressed at each position.
File formats: bigWig
Additional information: Histone ChIP-seq
footprints
Genomic sites delineating regions occupied by a protein or transcription factor and protected from degradation by an enzyme such as DNase I.
File formats: bed, bigBed
Additional information: DNase-seq
gene quantifications
Quantifications of reads (or read pairs, in paired-end sequencing) aligning to the gene annotation reference, either by raw or normalized counts.
File formats: tsv
Additional information: Small RNA-seqBulk RNA-seq
genome index
A preprocessed form of the genome reference used to facilitate downstream analysis.
File formats: tar, tsv, gff
Additional information: RAMPAGE and CAGEWGBSBulk RNA-seq
genome reference
A composite nucleic acid sequence assembled from the sequence of several different individual organisms representing the species.
File formats: fasta, tar, gff, gtf
Additional information: ENCODE Reference Sequences
hotspots
Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity.
File formats: bed, bigBed
Additional information: DNase-seq
IDR ranked peaks
The set of peak calls ranked by IDR score.
File formats: bed
Additional information: Transcription Factor ChIP-seqATAC-seq
IDR thresholded peaks
The set of peak calls that pass an IDR threshold, indicating statistical confidence that these are reproducible peaks.
File formats: bed, bigBed
Additional information: Transcription Factor ChIP-seqATAC-seq
library fraction
Estimates of the fraction of RBNS reads which are bound at different kmers in an RBNS library in descending order.
File formats: tsv
Additional information: RNA Bind-N-Seq
methylation state at CHG
The read depth and percent methylation at CHG sites.
File formats: bed, bigBed
Additional information: WGBS
methylation state at CHH
The read depth and percent methylation at CHH sites.
File formats: bed, bigBed
Additional information: WGBS
methylation state at CpG
The read depth and percent methylation at CpG sites.
File formats: bed, bigBed
Additional information: WGBS
microRNA quantifications
Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to each microRNA gene in the reference annotation.
File formats: tsv, bed, bigBed
Additional information: microRNA-seqmicroRNA Counts
minus strand signal of all reads
A signal coverage track of all reads (unique & multimapping) on the minus strand.
File formats: bigWig
Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq
minus strand signal of unique reads
A signal coverage track of unique reads on the minus strand.
File formats: bigWig
Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq
normalized signal of all reads
A normalized signal coverage track of all reads (unique & multimapping).
File formats: bed, bigWig
optimal IDR thresholded peaks
In replicated experiments, the largest set of reproducible peak calls that pass an IDR threshold analyzing replicates.
File formats: bed, bigBed
peaks
Detected regions of relative enrichment in coverage data.
File formats: bed, bigBed
Additional information: Transcription Factor ChIP-seqHistone ChIP-seqATAC-seq
plus strand signal of all reads
A signal coverage track of all reads (unique & multimapping) on the plus strand.
File formats: bigWig
Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq
plus strand signal of unique reads
A signal coverage track of unique reads on the plus strand.
File formats: bigWig
Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq
pseudoreplicated IDR thresholded peaks
The set of peak calls from two partitions, or "pseudoreplicates" that are well-supported in both (i.e. cross the same IDR threshold as for replicated experiments).
File formats: bed, bigBed
pseudoreplicated peaks
The set of peak calls from two partitions, or "pseudoreplicates."
File formats: bed, bigBed
Additional information: Histone ChIP-seqATAC-seq
raw signal
The raw signal coverage track of all reads.
File formats: bigWig
read-depth normalized signal
A signal coverage track normalized by read depth.
File formats: bigWig
reads
Individual sequences of bases corresponding to DNA or RNA fragments in a FASTQ text file format.
File formats: fastq
reference variants
Coordinates and genotypes of variants for a reference genome.
File formats: vcf
replicated peaks
Detected regions of relative enrichment in coverage data observed in both replicates.
File formats: bed, bigBed
Additional information: Histone ChIP-seqATAC-seq
sequence alignability
A genomic track providing a measure of how often the sequence of a given length found at a particular location will align within the whole genome.
File formats: bed, bigBed
Additional information: DNase-seq
signal of all reads
A signal coverage track of all reads (unique & multimapping).
File formats: bigWig, wig
signal of unique reads
A signal coverage track of unique reads.
File formats: bigWig, bed, csv
signal p-value
Nucleotide resolution signal coverage track, expressed as a p-value to reject the null hypothesis that the signal at that location is present in the control.
File formats: bigWig
Additional information: Histone ChIP-seq
spike-ins
Nucleic acid fragments of known sequence and quantity used for calibration in high-throughput sequencing.
File formats: fasta
splice junctions
Genomic locations of exon-exon boundaries in transcripts.
File formats: tsv
transcript quantifications
Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to individual transcript isoforms (these may include spike-ins).
File formats: tsv, bigBed
Additional information: Long read RNA-seq
transcription start sites
An annotation or set of regions that are identifed as transcription start sites (TSS) in the genome.
File formats: bed, bigBed, gff, gtf
Additional information: RAMPAGE and CAGE
transcriptome alignments
The mapping locations of input reads with respect to the transcriptome.
File formats: bam
transcriptome annotations
Genomic coordinates of transcripts and their known or novel status as compared to reference annotation.
File formats: gtf
transcriptome index
A preprocessed form of the transcriptome reference used to facilitate downstream analysis.
File formats: idx, database
transcriptome reference
The transcriptomic sequence of an idealized representative individual in a species.
File formats: tsv
unfiltered alignments
The mapping locations of input reads with respect to a genome or other provided reference without any filtering (such as removing duplicates).
File formats: bam