Glossary

General terms | File output types

General terms

functional characterization data

Data generated by assays (i.e STARR-seq, MPRA, and CRISPR screen) investigating the relationship between DNA sequences and their regulatory activities.

functional genomics data

Data generated by assays investigating processes such as transcription, translation and epigenetic regulation on a genome-wide scale. Examples of assays generating functional genomics data: RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, WGBS, HiC, and ChIA-PET.

File output types

alignments

The mapping locations of input reads with respect to a genome or other provided reference.

File formats: bam

conservative IDR thresholded peaks

In replicated experiments, the set of reproducible peaks that pass an IDR threshold from two replicates.

File formats: bed, bigBed

Additional information: Transcription Factor ChIP-seqATAC-seq

enrichment

Elements or regions that appear at a statistically elevated rate compared to the control or baseline.

File formats: tsv, csv, bed

Additional information: RNA Bind-N-Seq

exclusion list regions

A comprehensive set of regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.

File formats: bed

FDR cut rate

Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity at different false discovery rates.

File formats: bed, bigBed

fold change over control

Nucleotide resolution signal coverage track, fold-over control expressed at each position.

File formats: bigWig

Additional information: Histone ChIP-seq

footprints

Genomic sites delineating regions occupied by a protein or transcription factor and protected from degradation by an enzyme such as DNase I.

File formats: bed, bigBed

Additional information: DNase-seq

gene quantifications

Quantifications of reads (or read pairs, in paired-end sequencing) aligning to the gene annotation reference, either by raw or normalized counts.

File formats: tsv

Additional information: Small RNA-seqBulk RNA-seq

genome index

A preprocessed form of the genome reference used to facilitate downstream analysis.

File formats: tar, tsv, gff

Additional information: RAMPAGE and CAGEWGBSBulk RNA-seq

genome reference

A composite nucleic acid sequence assembled from the sequence of several different individual organisms representing the species.

File formats: fasta, tar, gff, gtf

Additional information: ENCODE Reference Sequences

hotspots

Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity.

File formats: bed, bigBed

Additional information: DNase-seq

IDR ranked peaks

The set of peak calls ranked by IDR score.

File formats: bed

Additional information: Transcription Factor ChIP-seqATAC-seq

IDR thresholded peaks

The set of peak calls that pass an IDR threshold, indicating statistical confidence that these are reproducible peaks.

File formats: bed, bigBed

Additional information: Transcription Factor ChIP-seqATAC-seq

library fraction

Estimates of the fraction of RBNS reads which are bound at different kmers in an RBNS library in descending order.

File formats: tsv

Additional information: RNA Bind-N-Seq

methylation state at CHG

The read depth and percent methylation at CHG sites.

File formats: bed, bigBed

Additional information: WGBS

methylation state at CHH

The read depth and percent methylation at CHH sites.

File formats: bed, bigBed

Additional information: WGBS

methylation state at CpG

The read depth and percent methylation at CpG sites.

File formats: bed, bigBed

Additional information: WGBS

microRNA quantifications

Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to each microRNA gene in the reference annotation.

File formats: tsv, bed, bigBed

Additional information: microRNA-seqmicroRNA Counts

minus strand signal of all reads

A signal coverage track of all reads (unique & multimapping) on the minus strand.

File formats: bigWig

Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq

minus strand signal of unique reads

A signal coverage track of unique reads on the minus strand.

File formats: bigWig

Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq

normalized signal of all reads

A normalized signal coverage track of all reads (unique & multimapping).

File formats: bed, bigWig

optimal IDR thresholded peaks

In replicated experiments, the largest set of reproducible peak calls that pass an IDR threshold analyzing replicates.

File formats: bed, bigBed

peaks

Detected regions of relative enrichment in coverage data.

File formats: bed, bigBed

Additional information: Transcription Factor ChIP-seqHistone ChIP-seqATAC-seq

plus strand signal of all reads

A signal coverage track of all reads (unique & multimapping) on the plus strand.

File formats: bigWig

Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq

plus strand signal of unique reads

A signal coverage track of unique reads on the plus strand.

File formats: bigWig

Additional information: Small RNA-seqBulk RNA-seqmicroRNA-seq

pseudoreplicated IDR thresholded peaks

The set of peak calls from two partitions, or "pseudoreplicates" that are well-supported in both (i.e. cross the same IDR threshold as for replicated experiments).

File formats: bed, bigBed

pseudoreplicated peaks

The set of peak calls from two partitions, or "pseudoreplicates."

File formats: bed, bigBed

Additional information: Histone ChIP-seqATAC-seq

raw signal

The raw signal coverage track of all reads.

File formats: bigWig

read-depth normalized signal

A signal coverage track normalized by read depth.

File formats: bigWig

reads

Individual sequences of bases corresponding to DNA or RNA fragments in a FASTQ text file format.

File formats: fastq

reference variants

Coordinates and genotypes of variants for a reference genome.

File formats: vcf

replicated peaks

Detected regions of relative enrichment in coverage data observed in both replicates.

File formats: bed, bigBed

Additional information: Histone ChIP-seqATAC-seq

sequence alignability

A genomic track providing a measure of how often the sequence of a given length found at a particular location will align within the whole genome.

File formats: bed, bigBed

Additional information: DNase-seq

signal of all reads

A signal coverage track of all reads (unique & multimapping).

File formats: bigWig, wig

signal of unique reads

A signal coverage track of unique reads.

File formats: bigWig, bed, csv

signal p-value

Nucleotide resolution signal coverage track, expressed as a p-value to reject the null hypothesis that the signal at that location is present in the control.

File formats: bigWig

Additional information: Histone ChIP-seq

spike-ins

Nucleic acid fragments of known sequence and quantity used for calibration in high-throughput sequencing.

File formats: fasta

splice junctions

Genomic locations of exon-exon boundaries in transcripts.

File formats: tsv

transcript quantifications

Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to individual transcript isoforms (these may include spike-ins).

File formats: tsv, bigBed

Additional information: Long read RNA-seq

transcription start sites

An annotation or set of regions that are identifed as transcription start sites (TSS) in the genome.

File formats: bed, bigBed, gff, gtf

Additional information: RAMPAGE and CAGE

transcriptome alignments

The mapping locations of input reads with respect to the transcriptome.

File formats: bam

transcriptome annotations

Genomic coordinates of transcripts and their known or novel status as compared to reference annotation.

File formats: gtf

transcriptome index

A preprocessed form of the transcriptome reference used to facilitate downstream analysis.

File formats: idx, database

transcriptome reference

The transcriptomic sequence of an idealized representative individual in a species.

File formats: tsv

unfiltered alignments

The mapping locations of input reads with respect to a genome or other provided reference without any filtering (such as removing duplicates).

File formats: bam