RAMPAGE and CAGE Data Standards and Processing Pipeline

Assay Overview

 

RAMPAGE (RNA Annotation and Mapping of Promoters for the Analysis of Gene Expression) is a sequencing approach designed to identify transcription start sites (TSSs) at base-pair resolution, quantify their expression, and characterize their transcripts. The assay uses direct cDNA evidence to link specific genes and their regulatory TSSs 1.
 

Updated June 2017

Pipeline Overview

The RAMPAGE pipeline was developed as a part of the ENCODE Uniform Processing Pipelines series. The full RAMPAGE pipeline code is freely available on Github and can be run on DNAnexus (link requires account creation) at their current pricing.

The ENCODE-developed pipeline for RAMPAGE assays is also used for the analysis of CAGE (Cap Analysis Gene Expression), and can process libraries generated using rRNA-depleted total RNA >200 nucleotides in size. The CAGE method is intended to provide information on the 5' end of mRNA, and by extensions, TSSs; RAMPAGE is an improvement of the CAGE method 1.

Pipeline schematic

View the current instance of this pipeline

Inputs:

File format

Information contained in file

File description

Notes

fastq

reads

Paired-end, g-zipped DNA-sequencing reads Reads must meet the criteria outlined under the Uniform Processing Pipeline Restrictions.
fastq control reads A Bismark-transformed, Bowtie-indexed genome  
tar genome index G-zipped STAR genome index  
View RAMPAGE library structure overview.

Outputs:

File format

Information contained in file

File description

Notes

bam

alignments

Produced by mapping reads to the genome

 
bigWig signal Signals are generated both for unique reads and for unique+multimapping reads. If data are stranded, unique and unique+multimapping signals are produced for each strand (minus and plus). If the data are unstranded, signals are created without attention given to individual strands.
bed tss_peak, bigBed tss_peak, gff transcription start sites (TSS) Raw peak files for each replicate  
bed idr_peak, bigBed idr_peak consensus transcription start sites (TSS) IDR comparison of TSSs generated from individual replicates. Irreproducible Discovery Rate (IDR): compares two peak files (bed), typically from a pair of replicates of the same experiment, allowing validation of the experiment methods and reducing noise in the final results. 
tsv gene quantifications    

Quality control metrics are also generated, comparing two TSS quantification files and calculating the Mean Absolute Deviation and correlations.

References

Genomic References

View the mapping assembly and genome annotation reference files used in this pipeline

These pipelines require both assembly information for the species of interest and a gene reference. Each of the main programs, TopHat, STAR, and RSEM create an index for use in subsequent steps. More information on the use of RSEM is available here.

Links and Publications

Find RAMPAGE and CAGE data generated by this pipeline
Explore publications related to RAMPAGE and CAGE

1. Batut, Philippe et al. “High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression.” Genome Research 23.1 (2013): 169–180. PMC. Web. 9 Feb. 2016.

Uniform Processing Pipeline Restrictions

  • Sequencing must be paired-ended.
  • The read length should be a minimum of 50 base pairs.
  • All Illumina platforms are supported for use in the uniform pipeline; colorspace (SOLiD) are not supported.
  • Barcodes and spike-ins, if present in the fastq, must be indicated.
  • Each RAMPAGE or CAGE experiment must have a corresponding RNA-seq experiment as a control. 
  • Library insert size range must be indicated.
  • Alignment files are mapped to either the GRCh38 or mm10 sequences.
  • Gene and transcript quantification files are annotated to either GENCODE V24 or M4.
  • For IDR comparison, the experiment must have two and only two replicates.

Current Standards (RAMPAGE)

Experimental guidelines for RAMPAGE experiments can be found here.

  • Experiments should have at least two replicates.
  • Each replicate should have 20 million aligned reads. Older projects aimed for 10 million aligned reads.
  • Each RAMPAGE experiment should have a corresponding RNA-seq experiment as a control.
  • Replicate concordance: the gene level quantification should have a Spearman correlation of >0.9 between isogenic replicates and >0.8 between anisogenic replicates.
  • The experiment must pass routine metadata audits in order to be released.