Glossary
General terms | File output types | Target categories | candidate Cis-Regulatory Element (cCRE) subtypes
General terms
functional characterization data
Data generated by assays (i.e STARR-seq, MPRA, and CRISPR screen) investigating the relationship between DNA sequences and their regulatory activities.
functional genomics data
Data generated by assays investigating processes such as transcription, translation and epigenetic regulation on a genome-wide scale. Examples of assays generating functional genomics data: RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, WGBS, HiC, and ChIA-PET.
File output types
alignments
The mapping locations of input reads with respect to a genome or other provided reference.
File formats: bam
conservative IDR thresholded peaks
In replicated experiments, the set of reproducible peaks that pass an IDR threshold from two replicates.
File formats: bed, bigBed
Additional information: Transcription Factor ChIP-seq, ATAC-seq
element gene interactions p-value
Arcs linking candidate regulatory elements with their target genes, and the significance of the measured change in expression upon CRISPR-based perturbation of the regulatory element.
File formats: bed, bigBed
element gene interactions signal
Arcs linking candidate regulatory elements with their target genes, and the measured change in expression upon CRISPR-based perturbation of the regulatory element.
File formats: bed, bigBed
element quantifications
The identifying information and expression change measurements at candidate regulatory elements on the evaluated examined loci following perturbations in a pooled CRISPR screen experiment.
File formats: tsv
enrichment
Elements or regions that appear at a statistically elevated rate compared to the control or baseline.
File formats: tsv, csv, bed
Additional information: RNA Bind-N-Seq
exclusion list regions
A comprehensive set of regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.
File formats: bed
FDR cut rate
Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity at different false discovery rates.
File formats: bed, bigBed
fine-mapped variants
Variants identified by fine-mapping, the process by which a trait-associated region from a genome-wide association study (GWAS) is analysed to identify the particular genetic variants that are likely to causally influence the examined trait.
File formats: tsv
fold change over control
Nucleotide resolution signal coverage track, fold-over control expressed at each position.
File formats: bigWig
Additional information: Histone ChIP-seq
footprints
Genomic sites delineating regions occupied by a protein or transcription factor and protected from degradation by an enzyme such as DNase I.
File formats: bed, bigBed
Additional information: DNase-seq
gene quantifications
Quantifications of reads (or read pairs, in paired-end sequencing) aligning to the gene annotation reference, either by raw or normalized counts.
File formats: tsv
Additional information: Small RNA-seq, Bulk RNA-seq
genome index
A preprocessed form of the genome reference used to facilitate downstream analysis.
File formats: tar, tsv, gff
Additional information: RAMPAGE and CAGE, WGBS, Bulk RNA-seq
genome reference
A composite nucleic acid sequence assembled from the sequence of several different individual organisms representing the species.
File formats: fasta, tar, gff, gtf
Additional information: ENCODE Reference Sequences
guide quantifications
The identifying information and sequencing read counts for each gRNA in a pooled CRISPR screen experiment.
File formats: tsv
hotspots
Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity.
File formats: bed, bigBed
Additional information: DNase-seq
IDR ranked peaks
The set of peak calls ranked by IDR score.
File formats: bed
Additional information: Transcription Factor ChIP-seq, ATAC-seq
IDR thresholded peaks
The set of peak calls that pass an IDR threshold, indicating statistical confidence that these are reproducible peaks.
File formats: bed, bigBed
Additional information: Transcription Factor ChIP-seq, ATAC-seq
library fraction
Estimates of the fraction of RBNS reads which are bound at different kmers in an RBNS library in descending order.
File formats: tsv
Additional information: RNA Bind-N-Seq
methylation state at CHG
The read depth and percent methylation at CHG sites.
File formats: bed, bigBed
Additional information: WGBS
methylation state at CHH
The read depth and percent methylation at CHH sites.
File formats: bed, bigBed
Additional information: WGBS
methylation state at CpG
The read depth and percent methylation at CpG sites.
File formats: bed, bigBed
Additional information: WGBS
microRNA quantifications
Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to each microRNA gene in the reference annotation.
File formats: tsv, bed, bigBed
Additional information: microRNA-seq, microRNA Counts
minus strand signal of all reads
A signal coverage track of all reads (unique & multimapping) on the minus strand.
File formats: bigWig
Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq
minus strand signal of unique reads
A signal coverage track of unique reads on the minus strand.
File formats: bigWig
Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq
normalized signal of all reads
A normalized signal coverage track of all reads (unique & multimapping).
File formats: bed, bigWig
optimal IDR thresholded peaks
In replicated experiments, the largest set of reproducible peak calls that pass an IDR threshold analyzing replicates.
File formats: bed, bigBed
peaks
Detected regions of relative enrichment in coverage data.
File formats: bed, bigBed
Additional information: Transcription Factor ChIP-seq, Histone ChIP-seq, ATAC-seq
perturbation signal
Guide RNA effect size track, with effects averaged across all gRNAs at a given nucleotide position.
File formats: bigWig
plus strand signal of all reads
A signal coverage track of all reads (unique & multimapping) on the plus strand.
File formats: bigWig
Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq
plus strand signal of unique reads
A signal coverage track of unique reads on the plus strand.
File formats: bigWig
Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq
pseudoreplicated IDR thresholded peaks
The set of peak calls from two partitions, or "pseudoreplicates" that are well-supported in both (i.e. cross the same IDR threshold as for replicated experiments).
File formats: bed, bigBed
pseudoreplicated peaks
The set of peak calls from two partitions, or "pseudoreplicates."
File formats: bed, bigBed
Additional information: Histone ChIP-seq, ATAC-seq
raw signal
The raw signal coverage track of all reads.
File formats: bigWig
read-depth normalized signal
A signal coverage track normalized by read depth.
File formats: bigWig
reads
Individual sequences of bases corresponding to DNA or RNA fragments in a FASTQ text file format.
File formats: fastq
reference variants
Coordinates and genotypes of variants for a reference genome.
File formats: vcf
replicated peaks
Detected regions of relative enrichment in coverage data observed in both replicates.
File formats: bed, bigBed
Additional information: Histone ChIP-seq, ATAC-seq
sequence alignability
A genomic track providing a measure of how often the sequence of a given length found at a particular location will align within the whole genome.
File formats: bed, bigBed
Additional information: DNase-seq
signal of all reads
A signal coverage track of all reads (unique & multimapping).
File formats: bigWig, wig
signal of unique reads
A signal coverage track of unique reads.
File formats: bigWig, bed, csv
signal p-value
Nucleotide resolution signal coverage track, expressed as a p-value to reject the null hypothesis that the signal at that location is present in the control.
File formats: bigWig
Additional information: Histone ChIP-seq
spike-ins
Nucleic acid fragments of known sequence and quantity used for calibration in high-throughput sequencing.
File formats: fasta
splice junctions
Genomic locations of exon-exon boundaries in transcripts.
File formats: tsv
transcript quantifications
Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to individual transcript isoforms (these may include spike-ins).
File formats: tsv, bigBed
Additional information: Long read RNA-seq
transcription start sites
An annotation or set of regions that are identifed as transcription start sites (TSS) in the genome.
File formats: bed, bigBed, gff, gtf
Additional information: RAMPAGE and CAGE
transcriptome alignments
The mapping locations of input reads with respect to the transcriptome.
File formats: bam
transcriptome annotations
Genomic coordinates of transcripts and their known or novel status as compared to reference annotation.
File formats: gtf
transcriptome index
A preprocessed form of the transcriptome reference used to facilitate downstream analysis.
File formats: idx, database
transcriptome reference
The transcriptomic sequence of an idealized representative individual in a species.
File formats: tsv
unfiltered alignments
The mapping locations of input reads with respect to a genome or other provided reference without any filtering (such as removing duplicates).
File formats: bam
Target categoriesCategory assignment details
broad histone mark
Category of histone modifications that are frequently detected across relatively long continuous stretches of DNA. This category includes H3F3A, H3K27me3, H3K36me3, H3K4me1, H3K79me2, H3K79me3, H3K9me1, H3K9me2, H4K20me1.
chromatin remodeler
Proteins involved in any process that results in the specification, formation or maintenance of the physical structure of eukaryotic chromatin. For example, members of a protein complex that possesses histone deacetylase activity.
cofactor
A protein or a member of a complex that interacts specifically and non-covalently with a DNA-bound DNA-binding transcription factor to initiate, activate or repress gene transcription.
cohesin
Proteins involved in a cell cycle process in which the sister chromatids of a replicated chromosome become tethered to each other.
control
Non-specific targets or mock targets which serve as a placeholder for experiments that are used as controls.
DNA repair
Proteins involved in a process of restoring DNA after damage. A variety of different DNA repair pathways have been reported that include direct reversal, base excision repair, nucleotide excision repair, photoreactivation, bypass, double-strand break repair pathway, and mismatch repair pathway.
DNA replication
Proteins involved in a cellular metabolic process in which a cell duplicates one or more molecules of DNA.
histone
Protein members of a complex comprised of DNA wound around a multisubunit core and associated proteins, which forms the primary packing unit of DNA in the nucleus into higher order structures.
narrow histone mark
Category of histone modifications that are frequently detected across relatively short continuous stretches of DNA. This category includes H2AFZ, H3ac, H3K27ac, H3K4me2, H3K4me3, H3K9ac.
other context
Protein with a unique function that does not belong to any other category.
recombinant protein
Genetically modified proteins. One common type of modification is epitope tagging.
RNA binding protein
Targets which interact selectively and non-covalently with RNA molecules.
RNA polymerase complex
Components of one of the three nuclear DNA-directed RNA polymerases complexes found in all eukaryotes, including RNA polymerase I, II and III.
tag
Short peptides that can be fused to proteins of interests and serve as antigens for antibodies.
transcription factor
A protein or a member of a complex that interacts selectively and non-covalently with chromatin or a specific DNA sequence (sometimes referred to as a motif) to modulate gene transcription.
candidate Cis-Regulatory Element (cCRE) subtypes
candidate Cis-Regulatory Elements (cCREs)
Elements classified by integrative analysis of biochemical signatures such as histone modifications to fall within distinct regulatory categories (see subtypes in this section).
CTCF-only
Elements (cCREs) defined by a signature of high DNase and CTCF signal and low H3K4me3 and H3K27ac signal.
distal enhancer-like (dELS)
Elements (cCREs) putatively labeled as enhancers, defined by a signature of high DNase and H3K27ac signal. They are denoted distal due to their presence outside 2 kb of an annotated GENCODE transcription start site (TSS).
DNase-H3K4me3
Elements (cCREs) defined by a signature of high DNase and H3K4me3 signal and low H3K27ac signal, and fall outside 200 bp of an annotated GENCODE transcription start site (TSS).
promoter-like (PLS)
Elements (cCREs) putatively labeled as promoters defined by a signature of high DNase and H3K4me3 signal and that lie within 200 bp of an annotated GENCODE transcription start site (TSS).
proximal enhancer-like (pELS)
Elements (cCREs) putatively labeled as enhancers, defined by a signature of high DNase and H3K27ac signal. They are denoted proximal due to their presence within 2 kb of an annotated GENCODE transcription start site (TSS).
representative DNase hypersensitivity sites (rDHS)
Regions of chromatin sensitive to DNase cleavage, called DNase hypersensitivity sites (DHS), collected from many DNase-seq datasets for an organism which are then iteratively clustered and filtered to select a list of non-overlapping representative sites in that organism.