Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression.

Libbrecht MW, Ay F, Hoffman MM, Gilbert DM, Bilmes JA, Noble WS.

Genome research. 2015 Apr;25(4):544-57.

Abstract: The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation. Previous genomic studies have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly regulated genes expressed in only a small number of cell types, which we term "specific expression domains." We found that domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used to transfer information from well-studied cell types to less well-characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data.
References: PMID:25677182

Related data

Available data: genomic annotations
File format: BED
Data summary: Semi-automated genome annotation algorithms take as input a collection of genomics data sets and simultaneously partition the genome and label each segment with an integer such that positions with the same label have similar patterns of activity. These algorithms are “semi-automated” because a human performs a functional interpretation of the labels after the annotation process. We used graph-based regularization to incorporate Hi-C data using the observation that positions close in 3D tend to occupy the same type of domain. Using this approach, we produced a model of the following chromatin domains (Constitutive heterochromatin (CON), Facultative heterochromatin (FAC), Quiescent domains (QUI), Broad expression (BRD), Specific expression (SPC)) in eight human cell types, thereby revealing the relationships among known domain types.