Glossary

General terms | File output types | Target categories | candidate Cis-Regulatory Element (cCRE) subtypes

General terms

functional characterization data

Data generated by assays (i.e STARR-seq, MPRA, and CRISPR screen) investigating the relationship between DNA sequences and their regulatory activities.

functional genomics data

Data generated by assays investigating processes such as transcription, translation and epigenetic regulation on a genome-wide scale. Examples of assays generating functional genomics data: RNA-seq, ChIP-seq, DNase-seq, ATAC-seq, WGBS, HiC, and ChIA-PET.

File output types

alignments

The mapping locations of input reads with respect to a genome or other provided reference.

File formats: bam

conservative IDR thresholded peaks

In replicated experiments, the set of reproducible peaks that pass an IDR threshold from two replicates.

File formats: bed, bigBed

Additional information: Transcription Factor ChIP-seq, ATAC-seq

element gene interactions p-value

Arcs linking candidate regulatory elements with their target genes, and the significance of the measured change in expression upon CRISPR-based perturbation of the regulatory element.

File formats: bed, bigBed

element gene interactions signal

Arcs linking candidate regulatory elements with their target genes, and the measured change in expression upon CRISPR-based perturbation of the regulatory element.

File formats: bed, bigBed

element quantifications

The identifying information and expression change measurements at candidate regulatory elements on the evaluated examined loci following perturbations in a pooled CRISPR screen experiment.

File formats: tsv

enrichment

Elements or regions that appear at a statistically elevated rate compared to the control or baseline.

File formats: tsv, csv, bed

Additional information: RNA Bind-N-Seq

exclusion list regions

A comprehensive set of regions that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.

File formats: bed

FDR cut rate

Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity at different false discovery rates.

File formats: bed, bigBed

fine-mapped variants

Variants identified by fine-mapping, the process by which a trait-associated region from a genome-wide association study (GWAS) is analysed to identify the particular genetic variants that are likely to causally influence the examined trait.

File formats: tsv

fold change over control

Nucleotide resolution signal coverage track, fold-over control expressed at each position.

File formats: bigWig

Additional information: Histone ChIP-seq

footprints

Genomic sites delineating regions occupied by a protein or transcription factor and protected from degradation by an enzyme such as DNase I.

File formats: bed, bigBed

Additional information: DNase-seq

gene quantifications

Quantifications of reads (or read pairs, in paired-end sequencing) aligning to the gene annotation reference, either by raw or normalized counts.

File formats: tsv

Additional information: Small RNA-seq, Bulk RNA-seq

genome index

A preprocessed form of the genome reference used to facilitate downstream analysis.

File formats: tar, tsv, gff

Additional information: RAMPAGE and CAGE, WGBS, Bulk RNA-seq

genome reference

A composite nucleic acid sequence assembled from the sequence of several different individual organisms representing the species.

File formats: fasta, tar, gff, gtf

Additional information: ENCODE Reference Sequences

guide quantifications

The identifying information and sequencing read counts for each gRNA in a pooled CRISPR screen experiment.

File formats: tsv

hotspots

Genomic regions with statistically significant enrichments, or "hotspots", of DNase I cleavage activity.

File formats: bed, bigBed

Additional information: DNase-seq

IDR ranked peaks

The set of peak calls ranked by IDR score.

File formats: bed

Additional information: Transcription Factor ChIP-seq, ATAC-seq

IDR thresholded peaks

The set of peak calls that pass an IDR threshold, indicating statistical confidence that these are reproducible peaks.

File formats: bed, bigBed

Additional information: Transcription Factor ChIP-seq, ATAC-seq

library fraction

Estimates of the fraction of RBNS reads which are bound at different kmers in an RBNS library in descending order.

File formats: tsv

Additional information: RNA Bind-N-Seq

methylation state at CHG

The read depth and percent methylation at CHG sites.

File formats: bed, bigBed

Additional information: WGBS

methylation state at CHH

The read depth and percent methylation at CHH sites.

File formats: bed, bigBed

Additional information: WGBS

methylation state at CpG

The read depth and percent methylation at CpG sites.

File formats: bed, bigBed

Additional information: WGBS

microRNA quantifications

Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to each microRNA gene in the reference annotation.

File formats: tsv, bed, bigBed

Additional information: microRNA-seq, microRNA Counts

minus strand signal of all reads

A signal coverage track of all reads (unique & multimapping) on the minus strand.

File formats: bigWig

Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq

minus strand signal of unique reads

A signal coverage track of unique reads on the minus strand.

File formats: bigWig

Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq

normalized signal of all reads

A normalized signal coverage track of all reads (unique & multimapping).

File formats: bed, bigWig

optimal IDR thresholded peaks

In replicated experiments, the largest set of reproducible peak calls that pass an IDR threshold analyzing replicates.

File formats: bed, bigBed

peaks

Detected regions of relative enrichment in coverage data.

File formats: bed, bigBed

Additional information: Transcription Factor ChIP-seq, Histone ChIP-seq, ATAC-seq

perturbation signal

Guide RNA effect size track, with effects averaged across all gRNAs at a given nucleotide position.

File formats: bigWig

plus strand signal of all reads

A signal coverage track of all reads (unique & multimapping) on the plus strand.

File formats: bigWig

Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq

plus strand signal of unique reads

A signal coverage track of unique reads on the plus strand.

File formats: bigWig

Additional information: Small RNA-seq, Bulk RNA-seq, microRNA-seq

pseudoreplicated IDR thresholded peaks

The set of peak calls from two partitions, or "pseudoreplicates" that are well-supported in both (i.e. cross the same IDR threshold as for replicated experiments).

File formats: bed, bigBed

pseudoreplicated peaks

The set of peak calls from two partitions, or "pseudoreplicates."

File formats: bed, bigBed

Additional information: Histone ChIP-seq, ATAC-seq

raw signal

The raw signal coverage track of all reads.

File formats: bigWig

read-depth normalized signal

A signal coverage track normalized by read depth.

File formats: bigWig

reads

Individual sequences of bases corresponding to DNA or RNA fragments in a FASTQ text file format.

File formats: fastq

reference variants

Coordinates and genotypes of variants for a reference genome.

File formats: vcf

replicated peaks

Detected regions of relative enrichment in coverage data observed in both replicates.

File formats: bed, bigBed

Additional information: Histone ChIP-seq, ATAC-seq

sequence alignability

A genomic track providing a measure of how often the sequence of a given length found at a particular location will align within the whole genome.

File formats: bed, bigBed

Additional information: DNase-seq

signal of all reads

A signal coverage track of all reads (unique & multimapping).

File formats: bigWig, wig

signal of unique reads

A signal coverage track of unique reads.

File formats: bigWig, bed, csv

signal p-value

Nucleotide resolution signal coverage track, expressed as a p-value to reject the null hypothesis that the signal at that location is present in the control.

File formats: bigWig

Additional information: Histone ChIP-seq

spike-ins

Nucleic acid fragments of known sequence and quantity used for calibration in high-throughput sequencing.

File formats: fasta

splice junctions

Genomic locations of exon-exon boundaries in transcripts.

File formats: tsv

transcript quantifications

Counts (reads in single-ended or read pairs in paired-ended sequencing runs) that map to individual transcript isoforms (these may include spike-ins).

File formats: tsv, bigBed

Additional information: Long read RNA-seq

transcription start sites

An annotation or set of regions that are identifed as transcription start sites (TSS) in the genome.

File formats: bed, bigBed, gff, gtf

Additional information: RAMPAGE and CAGE

transcriptome alignments

The mapping locations of input reads with respect to the transcriptome.

File formats: bam

transcriptome annotations

Genomic coordinates of transcripts and their known or novel status as compared to reference annotation.

File formats: gtf

transcriptome index

A preprocessed form of the transcriptome reference used to facilitate downstream analysis.

File formats: idx, database

transcriptome reference

The transcriptomic sequence of an idealized representative individual in a species.

File formats: tsv

unfiltered alignments

The mapping locations of input reads with respect to a genome or other provided reference without any filtering (such as removing duplicates).

File formats: bam

Target categoriesCategory assignment details

broad histone mark

Category of histone modifications that are frequently detected across relatively long continuous stretches of DNA. This category includes H3F3A, H3K27me3, H3K36me3, H3K4me1, H3K79me2, H3K79me3, H3K9me1, H3K9me2, H4K20me1.

chromatin remodeler

Proteins involved in any process that results in the specification, formation or maintenance of the physical structure of eukaryotic chromatin. For example, members of a protein complex that possesses histone deacetylase activity.

cofactor

A protein or a member of a complex that interacts specifically and non-covalently with a DNA-bound DNA-binding transcription factor to initiate, activate or repress gene transcription.

cohesin

Proteins involved in a cell cycle process in which the sister chromatids of a replicated chromosome become tethered to each other.

control

Non-specific targets or mock targets which serve as a placeholder for experiments that are used as controls.

DNA repair

Proteins involved in a process of restoring DNA after damage. A variety of different DNA repair pathways have been reported that include direct reversal, base excision repair, nucleotide excision repair, photoreactivation, bypass, double-strand break repair pathway, and mismatch repair pathway.

DNA replication

Proteins involved in a cellular metabolic process in which a cell duplicates one or more molecules of DNA.

histone

Protein members of a complex comprised of DNA wound around a multisubunit core and associated proteins, which forms the primary packing unit of DNA in the nucleus into higher order structures.

narrow histone mark

Category of histone modifications that are frequently detected across relatively short continuous stretches of DNA. This category includes H2AFZ, H3ac, H3K27ac, H3K4me2, H3K4me3, H3K9ac.

other context

Protein with a unique function that does not belong to any other category.

recombinant protein

Genetically modified proteins. One common type of modification is epitope tagging.

RNA binding protein

Targets which interact selectively and non-covalently with RNA molecules.

RNA polymerase complex

Components of one of the three nuclear DNA-directed RNA polymerases complexes found in all eukaryotes, including RNA polymerase I, II and III.

tag

Short peptides that can be fused to proteins of interests and serve as antigens for antibodies.

transcription factor

A protein or a member of a complex that interacts selectively and non-covalently with chromatin or a specific DNA sequence (sometimes referred to as a motif) to modulate gene transcription.

candidate Cis-Regulatory Element (cCRE) subtypes

candidate Cis-Regulatory Elements (cCREs)

Elements classified by integrative analysis of biochemical signatures such as histone modifications to fall within distinct regulatory categories (see subtypes in this section).

CTCF-only

Elements (cCREs) defined by a signature of high DNase and CTCF signal and low H3K4me3 and H3K27ac signal.

distal enhancer-like (dELS)

Elements (cCREs) putatively labeled as enhancers, defined by a signature of high DNase and H3K27ac signal. They are denoted distal due to their presence outside 2 kb of an annotated GENCODE transcription start site (TSS).

DNase-H3K4me3

Elements (cCREs) defined by a signature of high DNase and H3K4me3 signal and low H3K27ac signal, and fall outside 200 bp of an annotated GENCODE transcription start site (TSS).

promoter-like (PLS)

Elements (cCREs) putatively labeled as promoters defined by a signature of high DNase and H3K4me3 signal and that lie within 200 bp of an annotated GENCODE transcription start site (TSS).

proximal enhancer-like (pELS)

Elements (cCREs) putatively labeled as enhancers, defined by a signature of high DNase and H3K27ac signal. They are denoted proximal due to their presence within 2 kb of an annotated GENCODE transcription start site (TSS).

representative DNase hypersensitivity sites (rDHS)

Regions of chromatin sensitive to DNase cleavage, called DNase hypersensitivity sites (DHS), collected from many DNase-seq datasets for an organism which are then iteratively clustered and filtered to select a list of non-overlapping representative sites in that organism.