In order to identify functional elements, the ENCODE and modENCODE projects have collected thousands of datasets, each consisting of at least two replicates. Biochemical validation of every individual dataset is an impractical prospect. As an alternative, the consortia are characterizing the platforms used to comprehensively identify these functional elements. In some cases, this involves orthogonal testing of the platforms used to collect data to form catalogs of functional elements. For ChIP datasets, comparisons of ChIP-seq to ChIP-chip have been performed, as well as analysis of detection of spiked-in standards. For RNA-seq, examination of the behavior of spike-in standards has been performed. In addition to the standard ENCODE antibody characterization efforts, a systematic analysis of histone antibodies has been performed. Summaries to these individual studies and links to the publications are below.
Platform characterization studies
Systematic evaluation of factors influencing ChIP-seq fidelity.
Chen Y, Negre N, Li Q, Mieczkowska JO, Slattery M, Liu T, Zhang Y, Kim TK, He HH, Zieba J et al.Nat Methods. 2012 Apr 22; 9:609-14. doi: 10.1038/nmeth.1985; PMID: 22522655
Key platform characterization findings include the finding that ChIP-seq data were found to be biased to open chromatin regions, which led to false positives if not corrected. Removal of reads originating at the same base reduced false positives but had little effect on detection sensitivity. Even at a coverage depth of ~1 read per base pair, ~1% of the narrow peaks detected on a tiling array were missed by ChIP-seq, mostly owing to low mappability of these regions. For broad histone marks such as H3K36me3, the regularly adopted sequencing depth of 15-20 million reads may be insufficient to identify the vast majority of enriched regions in humans. There were notable variations in sensitivity and specificity between the algorithms under evaluation; some algorithms behaved unexpectedly at high sequencing depths.
Application of NanoString nCounter technology to the Validation of ChIP-Seq datasets in the ENCODE project.
Epstein CB, Goren A, Gymrek M, Ernst J, Shoresh N, Zhang X, Issner R, Coyne M, Amit I, Regev A et al.
The key platform characterization finding is that there is good agreement between ChIP detection by NanoString nCounter technology and high-throughput DNA sequencing. A custom NanoString codeset was devised that samples many ENCODE chromatin states. For ChIP experiments using 14 different antibodies, there was generally high concordance between the two detection methods. In addition, there was high concordance between NanoString assay of ChIP and high-throughput DNA sequencing libraries prepared from the same ChIP, for the 5 antibodies tested. These results indicate that high-throughput DNA sequencing maintains a robust representation of immunoprecipitated material.
Comparison of sequence-specific transcription factor determinations by ChIP-seq and ChIP-qPCR.
Gertz J, Reddy TE, Pauli F, Myers RM.
The key platform characterization finding is that there is good agreement between ChIP-seq and ChIP-qPCR. For each of 12 transcription factors, the enrichment of 44 binding sites was measured by qPCR. There was a high concordance between enrichment results from qPCR and the density of reads in ChIP-seq binding sites. These results indicate that high-throughput DNA sequencing maintains a robust representation of immunoprecipitated material.
Synthetic spike-in standards for RNA-seq experiments.
Jiang L, Schlesinger F, Davis CA, Zhang Y, Li R, Salit M, Gingeras TR, Oliver B.
Genome Res. 2011 Sep;21(9):1543-51. Epub 2011 Aug 4. PMID: 21816910; PMCID: PMC3166838
Key platform characterization finding is that over a wide range, there is a linear correlation between signal (read density) and RNA concentration (input) in RNA-seq experiments. Another key finding is excellent agreement between replicates. A pool of RNA standards (96 different RNAs, various lengths and GC content) spanning a million fold concentration range was used in this determination. Some bias was found with respect to GC content and fragment length; these biases were reproducible and protocol dependent.
ChIP-chip versus ChIP-seq: lessons for experimental design and data analysis.
Ho JW, Bishop E, Karchenko PV, Negre N, White KP, Park PJ.
BMC Genomics. 2011 Feb 28;12:134. PMID: 21356108; PMCID: PMC3053263
Key platform characterization findings include ChIP-seq data are generally better than ChIP-chip data with respect to signal-to-noise ratio, number of detected peaks and resolution. While there is strong agreement between the two platforms, the peaks identified using these two platforms can be significantly different, depending on the factor antibody and the analysis pipeline. Identification of binding regions is dependent on the peak calling pipeline used, and more difficult for factors that are enriched in broad regions. In addition, input DNA libraries used for ChIP-seq can vary, and high-quality input samples sequenced with sufficient depth are important for accurate peak calling.
An assessment of histone-modification antibody quality.
Egelhofer TA, Minoda A, Klugman S, Lee K, Kolasinska-Zwierz P, Alekseyenko AA, Cheung MS, Day DS, Gadel S, Gorchakov AA et al.
Nat Struct Mol Biol. 2011 Jan;18(1):91-3. Epub 2010 Dec 5. PMID: 21131980; PMCID: PMC3017233
Histone modification antibodies were characterized for specificity and ChIP. More than 25% of the tested antibodies failed specificity tests by dot blot or western blot. More than 20% of the antibodies that passed the specificity test failed in ChIP experiments. A website was developed for posting new results (http://compbio.med.harvard.edu/antibodies/).
Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets.
Johnson DS, Li W, Gordon DB, Bhattacharjee A, Curry B, Ghosh J, Brizuela L, Carroll JS, Brown M, Flicek P et al.
Genome Res. 2008 Mar;18(3):393-403. Epub 2008 Feb 7.PMID: 18258921; PMCID: PMC2259103
A number of microarray platforms were tested using a spike-in positive control approach, and found to provide results that were consistent with each other. Variance specific to microarray platform was similar or smaller than the variance associated with laboratory, protocols and analysis pipeline. Sensitivity was good even at relatively low spike-in levels. Simple repeats and segmental duplication caused false positive errors in peak detection.