scATAC-seq Data Standards and Processing Pipeline

Assay overview

scATAC-seq (single cell Assay for Transposase-Accessible Chromatin followed by sequencing) experiment provides genome-wide profiles of chromatin accessibility at a single cell or single nucleus resolution. 

 

Pipeline overview

 

The ENCODE single-cell / single-nucleus ATAC-seq pipeline was developed by the Kundaje lab at Stanford University. The original documentation for this pipeline, which is duplicated in part below, is available here.

The ENCODE single-cell / single-nucleus ATAC-seq pipeline comprises an upstream automated stage and a downstream manual stage. This document covers the automated portion of the pipeline, which performs barcode error correction, mapping using Bowtie2, filtering, and fragment file generation. ArchR is run on the fragment file to generate an initial summary of the dataset, including multiple QC metrics. 

The Github repository for the automated stage can be found here.

The pipeline currently supports the following platforms:

  • 10x sc/snATAC-seq
  • The ATAC-seq portion of 10x single-cell multiome (RNA+ATAC)
  • Split-pool scATAC-seq from the Bing Ren lab

The data will be displayed on the ENCODE portal as follows:

  • The automated processing outputs for each dataset will be attached to existing scATAC experiment objects with raw data.
  • Multiple manual processing analysis objects can be attached to each automated processing experiment object. Manual analyses can mix and match multiple automated processing objects thereby supporting integrative analysis across modalities and/or datasets

The downstream manual stage (not implemented or specified in this document) involves refining and annotating clusters by domain experts, as well as integrating related experiments. A completely automated end-to-end pipeline is not yet feasible for sc/snATAC-seq data due to the multitude of subjective decisions that go into defining cell states, trajectories, etc. from these data. Furthermore, a sc/snATAC-seq may be part of multiple integrative analyses which also require subjective decisions that cannot be fully automated in advance (e.g. integration with scRNA-seq from a multi-ome assay or integration with other datasets). By separating the manual and automated sections, we allow for the flexibility of attaching multiple fine tuned downstream analyses to one or more collections of uniformly processed data.

 

Pipeline schematic

View the current instance of this pipeline

Inputs:

File format

Information contained in file

File description

Notes

fastq

reads

G-zipped scATAC-seq reads Reads must meet the criteria outlined in the Uniform Processing Pipeline Restrictions.

 

Outputs:

File format

Information contained in file

File description

Notes

fastq filtered reads    
bam unfiltered alignments Produced by mapping reads to the genome.  
bam alignments Produced by mapping reads to the genome.  
tar fragments    
tar ArchR project    

References

Genomic References

View the genome references and chromosome sizes used in this pipeline in human and mouse.

 

Links and Publications

Find data generated by this pipeline: Search results