• Functional Genomics data
    • Experiment search
    • Experiment matrix
    • ChIP-seq matrix
    • Human and mouse body maps
    • Functional genomics series
    • Single-cell experiments
    • Functional Characterization data
    • Experiment search
    • Experiment matrix
    • Cloud Resources
    • AWS Open Data
    • Azure Open Datasets
    • Collections
    • RNA-protein interactions (ENCORE)
    • Epigenomes from four individuals (ENTEx)
    • Rush Alzheimer’s disease study
    • Stem cell differentiation
    • Deeply profiled cell lines
    • Human donor matrix
    • Immune cells
    • Human reference epigenomes
    • Mouse reference epigenomes
    • Mouse development matrix
    • Protein knockdown (Degron)
    • Search by region
    • Publications
    • RNA-Get (gene expression)
    • About
    • Visualize (SCREEN)
    • Encyclopedia browser
    • Search
    • Methods
    • Antibodies
    • Genome references
    • Assays and standards
    • Glossary
    • File formats
    • Software tools
    • Pipelines
    • Data organization
    • Release policy
    • Schemas
    • Using the portal
    • Cart
    • REST API
    • Citing ENCODE
    • FAQ
    • Project Overview
    • Collaborations
    • ENCODE workshops
    • About the DCC
    • Listed carts

ENCODE Software

  • All
  • Portal
  • Encyclopedia
  • Uniform Processing Pipelines
  • Consortium Analysis
All software used or developed by the ENCODE Consortium
Clear all selections

Showing 200 of 352 results

ListReport
Number of displayed results:
2550100200
  • snrna_pseudobulk — source
    Scripts for generating gene quantifications for pseudobulks.
    Software type: quantification, filtering
    Software
    released
  • subset-bam — source
    subset-bam is a tool to subset a 10x Genomics BAM file based on a tag, most commonly the cell barcode tag.
    Software type: filtering
    Software
    released
  • Topyfic — source
    Topyfic is a Python library designed to apply rLDA to single_cells/bulk RNA-seq data to recover meaningful topics that involve the key genes like transcription factors involved in different steps.
    Software type: other
    Software
    released
  • CellBender — source
    CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
    Software type: other
    Software
    released
  • Scrublet — source
    Python code for identifying doublets in single-cell RNA-seq data.
    Software type: other
    Software
    released
  • Distal Regulation E-G correlation — source
    Compute correlation metrics between DNase-seq signal at cCREs with DNase-seq signal at gene promoters or RNA expression levels of genes.
    Software
    released
  • FUNCODE — source
    Scripts for computing Functional Conservation of DNA Elements (FUNCODE) scores from ENCODE DNase-seq, ATAC-seq and Histone ChIP-seq data
    Software
    released
  • Distal regulation ENCODE-rE2G — source
    Train ENCODE-rE2G models on CRISPR enhancer screen data and apply to generate genome-wide predictions of enhancer-gene regulatory connections.
    Software
    released
  • mex_gene_archive — source
    mex_gene_archive is a minimal file format designed to meet the needs of archiving sparse gene matrices in a format compatible with the ENCODE 4 Data Coordination Center.
    Software type: other
    Software
    released
  • CNDBTools — source
    Used to generate the in-silico Hi-C map for each chromosome.
    Software
    released
  • OpenMiChrom — source
    Used to create an ensemble of 3D structures with chromatin dynamics simulation software with input data from the Sequence Annotations (bed file) from PyMEGABASE.
    Software
    released
  • PyMEGABASE — source
    PyMEGABASE is used to generate sequence annotations at the compartment and subcompartment level for physical modeling annotations.
    Software
    released
  • PROcapNet Model Zoo Pipeline — source
    Software for BPNet models using PRO-cap data.
    Software type: machine learning
    Software
    released
  • ProCapNet — source
    Software for BPNet models using PRO-cap data.
    Software type: machine learning
    Software
    released
  • TF ChIP-seq BPNet Model Zoo Pipeline — source
    Placeholder description.
    Software type: machine learning
    Software
    released
  • ATAC-seq DNase-seq ChromBPNet Model Zoo Pipeline — source
    Placeholder description.
    Software type: machine learning
    Software
    released
  • ChromBPNet — source
    ChromBPNet is a fully convolutional neural network that uses dilated convolutions with residual connections to enable large receptive fields with efficient parameterization.
    Software type: machine learning
    Software
    released
  • BPNet — source
    BPNet is a python package with a CLI to train and interpret base-resolution deep neural networks trained on functional genomics data such as ChIP-nexus or ChIP-seq.
    Software type: machine learning
    Software
    released
  • pyranges
    GenomicRanges for Python.
    Software
    released
  • pandas
    Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
    Software
    released
  • Swan
    Swan is a Python library designed for the analysis and visualization of transcriptomes.
    Software type: other
    Software
    released
  • seqFISH+ — source
    Pipeline to process seqFISH data.
    Software type: quantification, other
    Software
    released
  • ABC-Enhancer-Gene-Prediction — source
    Cell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
    Software
    released
  • EPIraction — source
    The EPIraction algorithm uses Tikhonov-regularized least squares models to predict the interacting promoter-enhancer pairs.
    Software
    released
  • AnalyzeSpearATAC
    Software used to analyze Greenleaf lab's SpearATAC (perturbation followed by snATAC-seq) data.
    Software
    released
  • gRNA_to_log2FC — source
    Script for computing log2fc bigwig from gRNA counts
    Software
    released
  • GT_Scan — source
    GT-Scan is a web-based tool that scans a user-defined genomic region for candidate targets and ranks them in terms of the number of exact or approximate off-targets in the genome.
    Software
    released
  • Cerberus
    Cerberus software for long-read RNA-seq analysis
    Software type: other
    Software
    released
  • LAPA — source
    Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
    Software type: other
    Software
    released
  • CRISPRi-FlowFISH
    Software for the analysis of CRISPRi-FlowFISH data from Engreitz lab.
    Software
    released
  • CRISPy — source
    CRISPy is a lightweight versatile pipeline for CRISPR-screening analysis.
    Software type: quantification
    Software
    released
  • GraphReg — source
    GraphReg (Chromatin interaction aware gene regulatory modeling with graph attention networks) is a graph neural network based gene regulation model which integrates DNA sequence, 1D epigenomic data (such as chromatin accessibility and histone modifications), and 3D chromatin conformation data (such as Hi-C, HiChIP, Micro-C, HiCAR) to predict gene expression in an informative way.
    Software
    released
  • HiCDCPlus — source
    The package HiCDCPlus provides methods to determine significant and differential chromatin interactions by use of a negative binomial generalized linear model, as well as implementations for TopDom to call topologically associating domains (TADs), and Juicer eigenvector to find the A/B compartments. This vignette explains the use of the package and demonstrates typical workflows on HiC and HiChIP data.
    Software
    released
  • Zerone
    Zerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
    Software
    released
  • GEM-Tools
    GEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
    Software
    released
  • Fastx Toolkit — source
    The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
    Software
    released
  • TRACE
    Transcription Factor Footprinting Using DNase I Hypersensitivity Data and DNA Sequence
    Software
    released
  • psf-to-bedpe — source
    Quick script that converts psf to bedpe.
    Software
    released
  • 3d-dna — source
    We begin with a series of iterative steps whose goal is to eliminate misjoins in the input scaffolds. Each step begins with a scaffold pool (initially, this pool is the set of input scaffolds themselves). The scaffolding algorithm is used to order and orient these scaffolds. Next, the misjoin correction algorithm is applied to detect errors in the scaffold pool, thus creating an edited scaffold pool. Finally, the edited scaffold pool is used as an input for the next iteration of the misjoin correction algorithm. The ultimate effect of these iterations is to reliably detect misjoins in the input scaffolds without removing correctly assembled sequence. After this process is complete, the scaffolding algorithm is applied to the revised input scaffolds, and the output – a single “megascaffold” which concatenates all the chromosomes – is retained for post-processing.
    Software
    released
  • chromVar — source
    chromVAR is an R package for the analysis of sparse chromatin accessibility data from single cell or bulk ATAC or DNAse-seq data.
    Software
    released
  • bioraddbg ATAC-seq MACS2 — source
    This Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
    Software
    released
  • bioraddbg ATAC-seq filter beads — source
    This Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
    Software
    released
  • bioraddbg ATAC-seq BWA — source
    This Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
    Software
    released
  • bioraddbg ATAC-seq deconvolute — source
    This Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
    Software
    released
  • guppy_basecaller — source
    Ont-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
    Software
    released
  • MAGECK — source
    Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology.
    Software type: other
    Software
    released
  • RELICS — source
    RELICS is an analysis method for discovering functional sequences from tiling CRISPR screens.
    Software type: quantification
    Software
    released
  • polyAsite_workflow — source
    Pipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
    Software
    released
  • gencode_utr_fix — source
    This package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
    Software
    released
  • pyfaidx — source
    This python module implements pure Python classes for indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
    Software
    released
  • seqkit — source
    A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
    Software
    released
  • STARsolo — source
    STARsolo is a tool for mapping, demultiplexing, and quantification for single cell RNA-seq.
    Software
    released
  • HTSlib — source
    A C library for reading/writing high-throughput sequencing data
    Software
    released
  • fastp — source
    Tool for preprocessing fastq files
    Software
    released
  • ArchR — source
    R package for single-cell ATAC-seq data analysis
    Software
    released
  • DELTA — source
    Tool to produce chromatin stripes, long range chromatin interactions, and topologically associated domains for Hi-C data.
    Software type: other
    Software
    released
  • SLICE — source
    Subcompartment Landscape Identification via Clustering Enrichments
    Software type: other
    Software
    released
  • interpretation_samples — source
    Interpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.
    Software type: genome segmentation
    Software
    released
  • Scanpy — source
    Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing.
    Software
    released
  • LR-splitpipe — source
    Demultiplexing and debarcoding tool designed for LR-Split-seq data.
    Software
    released
  • Seurat — source
    Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
    Software
    released
  • split-pipe — source
    The Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
    Software
    released
  • PINTS — source
    Yu lab repository for signal generation and peak calling scripts
    Software
    released
  • dREG — source
    Detecting Regulatory Elements using GRO-seq and PRO-seq
    Software
    released
  • bigWigMerge — source
    This tool from kentUtils merges together multiple bigWigs into a single output
    Software
    released
  • PRINSEQ Lite — source
    PRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
    Software
    released
  • sQTLseekeR — source
    sQTLseekeR is a package to detect splicing QTLs (sQTLs), which are variants associated with change in the splicing pattern of a gene. In sQTLSeeker, splicing patterns are modeled by the relative expression of the transcripts of a gene. The most recent version of sQTLseekeR can be employed to detect genetic variant associated to any multivariate phenotype
    Software type: variant annotation
    Software
    released
  • ggsashimi — source
    a command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments. It uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. It is implemented in python, and internally generates R code for plotting.
    Software type: visualization
    Software
    released
  • MPRAmodel — source
    Tool to analyze counts and generate processed files
    Software type: quantification, file format conversion
    Software
    released
  • MPRAcount — source
    Tool to process Tag-seq data and generate the count matrix
    Software type: quantification
    Software
    released
  • MPRAmatch — source
    Tool to identify barcode-oligo pairs
    Software type: utility
    Software
    released
  • Library sequencing match — source
    House script that was matching the guides (from an input list) to the fastq files as returned by deep sequencing
    Software type: quantification
    Software
    released
  • FORGE2 — source
    FORGE2 identifies tissue- or cell type-specific signal by analysing a minimum set of 5 single nucleotide polymorphisms (SNPs) for overlap with epigenetic data peaks compared to matched background SNPs and provides both graphical and tabular outputs.
    Software type: integrated analysis
    Software
    released
  • eFORGE — source
    eFORGE identifies tissue or cell type-specific signal by analysing a minimum set of 5 differentially methylated positions (DMPs) for overlap with DNase I hypersensitive sites (DHSs) compared to matched background DMPs and provides both graphical and tabulated outputs.
    Software type: integrated analysis
    Software
    released
  • GenomeStudio — source
    Software developed by Illumina for analysis of microarray data.
    Software type: other
    Software
    released
  • CRISPR screen peak calling — source
    Takes CASA output and makes ENCODE sandard element quantification file
    Software type: file format conversion
    Software
    released
  • CRISPR screen track builder — source
    Takes guide quantification and builds a browser track perturbation signal file
    Software type: quantification
    Software
    released
  • merge_bcs — source
    This Jupyter notebook merges files of barcodes to create a pass list of barcodes in common between the input files.
    Software
    released
  • ptools_bin — source
    A data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.
    Software type: other
    Software
    released
  • SCREEN — source
    SCREEN is a web-based visualizer for the ENCODE Registry of cCREs. Users can search for cCREs by genomic region or by associated features such as genes and SNPs, and can also visualize associated underlying annotations from the ground and integrative levels of the ENCODE Encyclopedia such as gene expression, TF ChIP-seq peaks, chromatin states, and cCRE-target gene links. Additionally, users can access ENCODE data on the functional characterization of cCREs.
    Software
    released
  • POSSUM — source
    PCA Of Sparse, SUper Massive Matrices (POSSUM) contains R and C/C++ functions for very fast eigenvector calculation
    Software
    released
  • apricot — source
    apricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data and for training accurate machine learning models with just a fraction of the examples and compute.
    Software
    released
  • Pairix — source
    Pairix is a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates.
    Software
    released
  • CRADLE — source
    CRADLE (Correcting Read counts and Analysis of DifferentiaLly Expressed regions) is a package that was developed to analyze STARR-seq data. CRADLE removes technical biases from sonication, PCR, mappability and G-quadruplex sturcture, and generates bigwig files with corrected read counts. CRADLE then uses those corrected read counts and detects both activated and repressed enhancers. CRADLE will help find enhancers with better accuracy and credibility.
    Software
    released
  • IsoSeq3 — source
    IsoSeq3, a tool in the SMRTanalysis software suite available from Pacific Biosciences, contains tools for identifying transcripts (detecting polyA tails and concatemers, read clustering, and deduplication).
    Software
    released
  • Lima — source
    Lima, a tool in the SMRTanalysis software suite available from Pacific Biosciences, removes primers and demultiplexes barcodes.
    Software
    released
  • CCS — source
    CCS (Circular Consensus), a tool in the SMRTanalysis software suite available from Pacific Biosciences, generates highly accurate single-molecule consensus reads.
    Software
    released
  • VALIS — source
    High performance WebGL genome visualization
    Software
    released
  • Croo — source
    Croo is a Python package for organizing outputs from Cromwell. Croo parses metadata.json which is an output from Cromwell and makes an organized directory with a copy (or a soft link) of each output file as described in an output definition JSON file specified by --out-def-json.
    Software type: framework
    Software
    released
  • Caper — source
    Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for Cromwell. Caper is based on Unix and cloud platform CLIs (curl, gsutil and aws) and provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Also, Caper supports easy automatic file transfer between local/cloud storages (local path, s3://, gs:// and http(s)://). You can use these URIs in input JSON file or for a WDL file itself.
    Software type: framework
    Software
    released
  • encode_utils — source
    Tools that are useful to any ENCODE Consortium submitting group, as well as the general community working with ENCODE data. Library and scripts are coded in Python.
    Software
    released
  • Check Files — source
    Files are checked to see if the MD5 sum (both for gzipped and ungzipped) is identical to the submitted metadata, as well as run through the validateFiles program from Jim Kent's source utilities.
    Software
    released
  • SnoVault — source
    The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata.
    Software type: database
    Software
    released
  • encodeD — source
    Metadata database for ENCODE project
    Software type: database
    Software
    released
  • liftOver
    This UCSC tool converts genome coordinates and genome annotation files between assemblies.
    Software
    released
  • csaw — source
    Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
    Software type: peak caller
    Software
    released
  • fgbio — source
    A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
    Software
    released
  • eCLIP Repeat Family pipeline — source
    Pipeline for mapping repetitive elements.
    Software type: other
    Software
    released
  • eCLIP core pipeline — source
    Custom software developed by Yeo lab for use in the eCLIP pipeline.
    Software type: other
    Software
    released
  • fastq-tools — source
    A collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
    Software type: other
    Software
    released
  • UMI-tools — source
    Tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
    Software type: other
    Software
    released
  • mpraflow-tsv-to-bed
    This is a one-line custom Perl script used to generate a bed format file from tsv.
    Software
    released
  • MPRAflow — source
    This pipeline processes sequencing data from Massively Parallel Reporter Assays (MPRA) to create count tables for candidate sequences tested in the experiment.
    Software
    released
  • FASTQ read-name correction
    A script resolving FASTQ read-name inconsistencies
    Software
    released
  • Harmony — source
    As described in our manuscript, “Fast, sensitive and accurate integration of single-cell data with Harmony”.
    Software type: other
    Software
    released
  • Presto — source
    Presto performs a fast Wilcoxon rank sum test and auROC analysis, as describred in our manuscript, “Presto scales Wilcoxon and auROC analyses to millions of observations”.
    Software type: other
    Software
    released
  • Symphony — source
    As described in our manuscript, “Efficient and precise single-cell reference atlas mapping with Symphony”.
    Software type: other
    Software
    released
  • scPOST — source
    Simulation of single-cell datasets for power analyses that estimate power to detect cell state frequency shifts between conditions (e.g. an expansion of a cell state in disease vs. healthy), as described in our manuscript “Maximizing statistical power to detect clinically associated cell states with scPOST”.
    Software type: other
    Software
    released
  • cdr3-QTL — source
    We tested associations between HLA genotypes and TCR-CDR3 amino acid compositions. We treated the amino acid composition of CDR3 as a quantitative trait, and tested its association with HLA genotype; we call this CDR3 quantitative trait loci analysis (cdr3-QTL), as described in our manuscript “HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors”.
    Software type: other
    Software
    released
  • Imperio — source
    This software includes (i) DeepBoost, a gradient boosting method for constructing boosted deep learning annotations by integrating deep learning allelic-effect annotations with fine-mapped SNPs; (ii) tools to combine these deep learning annotations with SNP-to-gene (S2G) linking strategies and relevant gene sets, and (iii) Imperio, a method for integrating deep learning annotations with S2G strategies to predict gene expression in whole blood and construct allelic-effect annotations based on changes in predicted expression. Applications of these 3 approaches to blood-related traits are described in our manuscript “Integrative approaches to improve the informativeness of deep learning models for human complex diseases”.
    Software type: other
    Software
    released
  • GSSG — source
    GSSG consists of tools to generate enhancer-driven and master-regulator gene scores in blood, and combine these gene scores with distal and proximal SNP-to-gene (S2G) linking strategies to construct SNP annotations for blood-related traits, as described in our manuscript “Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNIP-to-gene linking strategies”.
    Software type: other
    Software
    released
  • S-LDXR — source
    S-LDXR is a method for stratifying squared trans-ethnic genetic correlation across genomic annotations, as described in our manuscript “Population-specific causal disease effect sizes in functionally important regions impacted by selection”.
    Software type: other
    Software
    released
  • AnnotBoost — source
    AnnotBoost is a gradient boosting-based framework to impute and denoise Mendelian disease-derived pathogenicity scores to improve their informativeness for common disease, as described in our manuscript “Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease”.
    Software type: variant annotation
    Software
    released
  • PolyFun — source
    PolyFun is a method that leverages genome-wide functional annotations to improve the fine-mapping power, as described in our manuscript “Functionally-informed fine-mapping and polygenic localization of complex trait heritability”.
    Software type: other
    Software
    released
  • ptools — source
    A data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.
    Software
    released
  • ChromImpute — source
    ChromImpute is software for large-scale systematic epigenome imputation. ChromImpute takes an existing compendium of epigenomic data and uses it to predict signal tracks for mark-sample combinations not experimentally mapped or to generate a potentially more robust version of data sets that have been mapped experimentally. ChromImpute bases its predictions on features from signal tracks of other marks that have been mapped in the target sample and the target mark in other samples with these features combined using an ensemble of regression trees.
    Software
    released
  • BBDUK — source
    This tool from the BBMap package filters, trims, or masks reads with kmer matches to an artifact/contaminant file.
    Software
    released
  • Bru-seq Tools
    The Ljungman lab scripts used to process Bru-seq, BruUV-seq, and BruChase data.
    Software
    released
  • Cell Ranger — source
    Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis (mkfastq, count, aggr, and reanalyze).
    Software
    released
  • xsv — source
    xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files.
    Software
    released
  • bsseq — source
    This R package is the reference implementation of the BSmooth algorithm for analyzing whole-genome bisulfite sequencing (WGBS) data.
    Software
    released
  • gemBS — source
    gemBS is a high performance bioinformatic pipeline designed for highthroughput analysis of DNA methylation data from whole genome bisulfites sequencing data (WGBS). It combines GEM3, a high performance read aligner and bs_call, a high performance variant and methyation caller, into a streamlined and efficient pipeline for bisulfite sueqnce analysis.
    Software
    released
  • REDITs — source
    REDITs contain a suite of tools to identify differential RNA editing sites using RNA-seq data
    Software type: other
    Software
    released
  • mountainClimber — source
    mountainClimber is a method for de novo identification of alternative transcript start sites and polyadenylation sites in RNA-seq data
    Software type: transcript identification
    Software
    released
  • BEAPR — source
    BEAPR is a method to identify allele-specific binding of RNA-binding proteins using eCLIP-seq data as described in our paper “Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA"
    Software type: variant annotation
    Software
    released
  • WashU Epigenome Browser — source
    The WashU Epigenome Browser provides visualization, integration and analysis tools for epigenomic datasets. Since 2010, it has provided the scientific community with data from large consortia including the Roadmap Epigenomics and the ENCODE projects. Browser features include: (i) visualization using virtual reality (VR), which has implications in biology education and the study of 3D chromatin structure; (ii) expanded public data hubs, including data from the 4DN, ENCODE, Roadmap Epigenomics, TaRGET, IHEC and TCGA consortia; (iii) a more responsive user interface; (iv) a history of interactions, which enables undo and redo; (v) a feature we call Live Browsing, which allows multiple users to collaborate remotely on the same session; (vi) the ability to visualize local tracks and data hubs. Amazon Web Services also hosts the browser at https://epigenomegateway.org/.
    Software type: database, other
    Software
    released
  • casTLE — source
    casTLE (cas9 High Throughput maximum Likelihood Estimator) uses an Empirical Bayesian framework to account for multiple sources of variability, including reagent efficacy and off-target effects for the analysis of large scale genomic pertubation screens.
    Software
    released
  • Mediated Expression Score Regression (MESC) — source
    MESC is a method for quantifying genetic effects on disease mediated by assayed gene expression levels (Yao et al. 2020 Nat Genet).
    Software type: quantification
    Software
    released
  • Stratified LD fourth moments (S-LD4M) — source
    This software implements our Stratified LD 4th moments regression (S-LD4M) method for estimating polygenicity across allele frequencies and functional categories, as described in our manuscript “Polygenicity of complex traits is explained by negative selection”.
    Software type: quantification
    Software
    released
  • FINDOR — source
    This software implements our Functionally Informed Novel Discovery Of Risk loci (FINDOR) method, as described in our manuscript “Leveraging polygenic functional enrichment to improve GWAS power”.
    Software type: variant annotation
    Software
    released
  • Ascertained Sequentially Markovian Coalescent (ASMC) — source
    ASMC is a method for inferring pairwise coalescence times implicating regions under negative selection that are enriched for disease heritability (Palamara et al. 2018 Nat Genet).
    Software type: other
    Software
    released
  • Signed LD profile (SLDP) regression — source
    Signed LD profile regression is a method for identifying genome-wide directional effects of signed functional annotations on diseases and complex traits, as described in our manuscript “Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk”.
    Software type: other
    Software
    released
  • bam2pairs — source
    This script converts a paired-end bam file to a pairs file.
    Software type: file format conversion
    Software
    released
  • Vierstra digital genomic footprinting — source
    footprint-tools is a python module for de novo detection of genomic footprints from DNase I data by simulating expected cleavage rates using a 6-mer DNase I cleavage preference model combined with density smoothing. Statistical significance of per-nucleotide cleavages are computed from a series emperically fit negative binomial distribution.
    Software
    released
  • Altius Index — source
    Method for generating a master list / Index of DNaseI-Hypersensitive Sites ("consensus DHSs").
    Software
    released
  • CPU — source
    ChIA-PET Utilities is a collection of efficient specialized programs for processing ChIA-PET data from raw reads to interactions.
    Software
    released
  • GuideScan — source
    A generalized CRISPR guideRNA design tool
    Software type: other
    Software
    released
  • DESeq2 — source
    Differential gene expression analysis based on the negative binomial distribution
    Software type: quantification
    Software
    released
  • CQN — source
    Software used to generate GC content and length normalized matrix
    Software type: quantification
    Software
    released
  • Salmon — source
    Software used to generate transcriptome quantifications
    Software type: quantification
    Software
    released
  • Tximport — source
    Software used to generate length scaled TPM matrix
    Software type: quantification
    Software
    released
  • HTSeq — source
    HTSeq: Analysing high-throughput sequencing data with Python
    Software type: quantification
    Software
    released
  • Generate SIRV GTFs — source
    Set of scripts used to generate GTFs that include SIRV sequences for use with the ENCODE long read RNA-seq pipeline.
    Software type: other
    Software
    released
  • STARRPeaker — source
    Peak caller for STARR-seq data.
    Software type: peak caller
    Software
    released
  • bedSort — source
    UCSC Genome Browser tool for sorting .bed files by chrom,chromStart.
    Software type: other
    Software
    released
  • CrossStitch — source
    CrossStitch creates personalized reference-quality diploid genomes without de novo assembly. The basic idea is rather than trying to assemble a genome from scratch, it will leverage a reference genome as a baseline, and then update it with any SNPs, indels, or structural variations present in your sample. For the best results, the data requirements are similar to a de novo assembly: Illumina-based data for SNPs and Indels, Long Read data for structural variants, and Phasing data such as 10X Linked Reads and/or HiC data. However the CrossStitch process is much less demanding, produces more accurate results, and the process is much more predictable. The output will be a phased VCF file with all variants (SNPs, Indels, and SVs) as well as a phased personalized diploid genome including 2 copies of each chromosome with the variants inserted at the correct locations.
    Software
    released
  • SURVIVOR — source
    SURVIVOR is a tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.
    Software
    released
  • Avocado — source
    Avocado is a multi-scale deep tensor factorization method for learning a latent representation of the human epigenome. The purpose of this model is two fold; first, to impute epigenomic experiments that have not yet been performed, and second, to learn a latest representation of the human epigenome that can be used as input for machine learning models in the place of epigenomic data itself.
    Software
    released
  • pbsv — source
    pbsv is a suite of tools to call and analyze structural variants in diploid genomes from PacBio single molecule real-time sequencing (SMRT) reads. The tools power the Structural Variant Calling analysis workflow in PacBio's SMRT Link GUI. pbsv calls insertions, deletions, inversions, duplications, and translocations. Both single-sample calling and joint (multi-sample) calling are provided.
    Software
    released
  • Sniffles — source
    Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
    Software
    released
  • NGMLR — source
    NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.
    Software
    released
  • HapCUT2 — source
    HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads.
    Software
    released
  • freebayes — source
    freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
    Software
    released
  • TALON — source
    TALON is a program for identifying, quantifying, and filtering known and novel genes/isoforms in long read transcriptome data sets. It is technology-agnostic in that it works from mapped SAM files, allowing data from different sequencing platforms (i.e. PacBio and Oxford Nanopore) to be analyzed side by side.
    Software
    released
  • TranscriptClean — source
    TranscriptClean is a tool for variant-aware, reference-based error correction of long reads.
    Software
    released
  • minimap2 — source
    Minimap2 aligns long reads to a reference genome.
    Software
    released
  • RBNS Pipeline — source
    From Burge lab (Freese, P.): "The RBNS pipeline is a set of bioinformatics tools to analyze data from high-throughput sequencing experiments of protein-bound RNAs. The current version includes read splitting, calculation of kmer frequencies and enrichments, QC metrics, production of motif sequence logos, and RNA secondary structure analysis."
    Software type: other
    Software
    released
  • Sambamba — source
    Sambamba is a high performance highly parallel robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Because of its efficiency is an important work horse running in many sequencing centres around the world today.
    Software
    released
  • kallisto
    kallisto is a program for quantifying abundances of transcripts from RNA-Seq data, or more generally of target sequences using high-throughput sequencing reads. It is based on the novel idea of pseudoalignment for rapidly determining the compatibility of reads with targets, without the need for alignment. On benchmarks with standard RNA-Seq data, kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. Pseudoalignment of reads preserves the key information needed for quantification, and kallisto is therefore not only fast, but also as accurate than existing quantification tools. In fact, because the pseudoalignment procedure is robust to errors in the reads, in many benchmarks kallisto significantly outperforms existing tools.
    Software
    released
  • cWorld-Dekker — source
    A collection of perl/python/R scripts for manipulating 5C/Hi-C data, developed by Dekker lab.
    Software
    released
  • cMapping — source
    Mapping pipeline for 5C/Hi-C experiments developed by Dekker Lab
    Software
    released
  • RSeQC — source
    RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. Some basic modules quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while RNA-seq specific modules evaluate sequencing saturation, mapped reads distribution, coverage uniformity, strand specificity, transcript level RNA integrity etc.
    Software type: filtering
    Software
    released
  • filterEnrich — source
    Calculate average signals
    Software
    released
  • calcEnrich — source
    Calculate enrichment file using RT stop as foreground and base density as background
    Software
    released
  • normalizeRTfile — source
    Normalize base density file
    Software
    released
  • combineRTreplicates — source
    Calculate average signals
    Software
    released
  • calcRT — source
    Calculate RT stops from sam file
    Software
    released
  • estimateRPKM — source
    Calculate expression values for all transcripts in FPKM from SAM files
    Software
    released
  • trimming — source
    Trimming fastq file to remove possible adapter contamination and also barcode regions
    Software
    released
  • readCollapse — source
    Collapse fastq file to remove PCR duplicates
    Software
    released
  • Surrogate Variable Analysis — source
    The sva package in Bioconductor contains functions for removing batch effects and other unwanted variation in high-throughput experiment.
    Software type: other
    Software
    released
  • Cuffdiff — source
    Cuffdiff can be used to find significant changes in transcript expression, splicing, and promoter use.
    Software type: quantification
    Software
    released
  • Miso — source
    MISO (Mixture of Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples
    Software type: quantification
    Software
    released
  • rMATs — source
    MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data.
    Software type: quantification
    Software
    released
  • DESeq — source
    DESeq is an R package to analyse count data from high-throughput sequencing assays such as RNA-Seq and test for differential expression.
    Software type: quantification
    Software
    released
  • Merge Peaks — source
    CWL-defined pipeline for using IDR to produce a set of peaks given two replicate eCLIP peaks
    Software
    released
  • juicertools
    Software package for analysis of Hi-C data.
    Software
    released
  • Arrowhead — source
    Arrowhead is an algorithm for finding contact domains.
    Software
    released
  • Juicer
    Juicer is a platform for analyzing high-resolution Hi-C data.
    Software
    released
  • dbGaP SRA to fastq
    Converts dbGaP-protected raw data in sra format to fastq format.
    Software
    released
  • Java
    Java virtual machine
    Software
    released
  • hictools — source
    Old version of Juicer software, preserved for archival purposes.
    Software type: other
    Software
    released
  • Pysam
    Python module warapping htslib C-API and samtools for accessing sam formatted alignment files
    Software type: other
    Software
    released
  • dnase-density
    Generates normalized density signal from aligned and filtered reads for DNase ENCODE uniform processing pipeline.
    Software type: file format conversion
    Software
    released
  • dnase-qc-bam
    Evaluates a sample of paired or single-end aligned and filtered reads for DNase ENCODE uniform processing pipeline.
    Software type: quality metric
    Software
    released
  • PREDICTD — source
    PaRallel Epigenomics Data Imputation with Cloud-based Tensor Decomposition
    Software
    released
  • bigWigAverageOverBed — source
    Compute average score of big wig over each bed, which may have introns.
    Software
    released
  • ATAC-seq software tools — source
    Tools developed by C. Leslie lab. for ATAC-seq data analysis
    Software
    released
  • lrna-signals
    Signal generation for bulk-RNA-seq ENCODE uniform processing pipeline.
    Software type: file format conversion
    Software
    released
  • MATS — source
    MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold. From the RNA-Seq data, MATS can automatically detect and analyze alternative splicing events corresponding to all major types of alternative splicing patterns. MATS handles replicate RNA-Seq data from both paired and unpaired study design.
    Software
    released
  • Bowtie 2
    Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
    Software
    released
  • Juicebox — source
    Juicebox is visualization software for Hi-C data. In this distribution, we include both the visualization software itself and command line tools for creating and analyzing files that can be loaded into Juicebox.
    Software
    released
  • make_bigwig_files.py — source
    converts bam to bigWig
    Software type: file format conversion
    Software
    released
  • Mango — source
    Mango: a bias-correcting ChIA-PET analysis pipeline
    Software type: peak caller
    Software
    released
  • rtracklayer — source
    Extensible framework for interacting with multiple genome browsers (currently UCSC built-in) and manipulating annotation tracks in various formats (currently GFF, BED, bedGraph, BED15, WIG, BigWig and 2bit built-in). The user may export/import tracks to/from the supported browsers, as well as query and modify the browser state, such as the current viewport.
    Software
    released
  • Genomic Alignments — source
    Provides efficient containers for storing and manipulating short genomic alignments (typically obtained by aligning short reads to a reference genome). This includes read counting, computing the coverage, junction detection, and working with the nucleotide content of the alignments.
    Software
    released
  • vcf2diploid — source
    Creates phased diploid genomes variants from a vcf file by integrating variants to a reference genome.
    Software type: variant annotation
    Software
    released
  • LongRanger
    Long Ranger is a set of analysis pipelines that processes Chromium sequencing output to align reads and call and phase SNPs, indels, and structural variants. These pipelines combine Chromium-specific algorithms with widely used components such as BWA, Freebayes, and GATK. Output is delivered in standard BAM, VCF, and BEDPE formats that are augmented with long range information.
    Software type: aligner, variant annotation
    Software
    released
  • chromCor.Rscript — source
    Calculates correlation between two replicate DNase signals.
    Software type: quality metric
    Software
    released
  • bigWigToWig — source
    The binary bigWig format can be converted to the text based wig or bedGraph formats using this utility.
    Software type: file format conversion
    Software
    released
  • Citing ENCODE
  • Privacy
  • Contact
  • Sign in / Create account
  • ENCODE
  • Stanford University
  • Creative Commons

©2025 Stanford University