Target categorization

Definition of ENCODE target categories

Category Definition
control Non-specific targets or mock targets which serve as a placeholder for experiments that are used as controls.
histone Protein members of a complex comprised of DNA wound around a multisubunit core and associated proteins, which forms the primary packing unit of DNA in the nucleus into higher order structures.
broad histone mark Category of histone modifications that are frequently detected across relatively long continuous stretches of DNA. This category includes H3F3A, H3K27me3, H3K36me3, H3K4me1, H3K79me2, H3K79me3, H3K9me1, H3K9me2, H4K20me1.
narrow histone mark Category of histone modifications that are frequently detected across relatively short continuous stretches of DNA. This category includes H2AFZ, H3ac, H3K27ac, H3K4me2, H3K4me3, H3K9ac.
chromatin remodeler Proteins involved in any process that results in the specification, formation or maintenance of the physical structure of eukaryotic chromatin. For example, members of a protein complex that possesses histone deacetylase activity.
recombinant protein Genetically modified proteins. One common type of modification is epitope tagging.
tag Short peptides that can be fused to proteins of interests and serve as antigens for antibodies.
RNA binding protein Targets which interact selectively and non-covalently with RNA molecules.
cofactor A protein or a member of a complex that interacts specifically and non-covalently with a DNA-bound DNA-binding transcription factor to initiate, activate or repress gene transcription.
cohesin Proteins involved in a cell cycle process in which the sister chromatids of a replicated chromosome become tethered to each other.
DNA replication Proteins involved in a cellular metabolic process in which a cell duplicates one or more molecules of DNA.
DNA repair Proteins involved in a process of restoring DNA after damage. A variety of different DNA repair pathways have been reported that include direct reversal, base excision repair, nucleotide excision repair, photoreactivation, bypass, double-strand break repair pathway, and mismatch repair pathway.
RNA polymerase complex Components of one of the three nuclear DNA-directed RNA polymerases complexes found in all eukaryotes, including RNA polymerase I, II and III.
transcription factor A protein or a member of a complex that interacts selectively and non-covalently with chromatin or a specific DNA sequence (sometimes referred to as a motif) to modulate gene transcription.
other context Protein with a unique function that does not belong to any other category.

How does ENCODE categorize targets based on GO annotation?

To assign appropriate categories to each human, mouse, fruit fly or worm target, we first check GO annotations of the target’s corresponding gene. For each GO term annotation, we cross-reference each GO term annotation and its parental term with the GO terms in the following table:

ENCODE Label GO Term ID GO Term Name
TF GO:0003700 DNA-binding transcription factor activity
RBP GO:0003723 RNA binding
RBP GO:0001070 RNA-binding transcription regulator activity
Cofactor GO:0008134 transcription factor binding
Cofactor GO:0003712 transcription coregulator activity
RNAP GO:0005736 RNA polymerase I complex
RNAP GO:0016591 RNA polymerase II, holoenzyme
RNAP GO:0005666 RNA polymerase III complex
Chromatin remodeler GO:0006325 chromatin organization
Chromatin remodeler GO:0000118 histone deacetylase complex
Cohesin GO:0007062 sister chromatid cohesion
DNA replication GO:0006260 DNA replication
DNA repair GO:0006281 DNA repair
histone GO:0000788 nuclear nucleosome
TF** GO:0003677 DNA binding
TF** GO:0003682 chromatin binding
TF** GO:0043167 ion binding
TF**: only used when there is no other data


Next we assign a score to each label based on the annotation evidence of the GO term:

Evidence code Score
EXP 2
IDA 2
IMP 2
IGI 2
IEP 2
HTP 1
HDA 1
HMP 1
HGI 1
HEP 1
TAS 1
IEA -1

 

Finally, we summarize labels and scores for the target by combining/summing same labels together. The label(s) having the highest score are used as the category assigned to the target.