ENCODE Software

All software used or developed by the ENCODE Consortium

Showing 100 of 324 results

List Report

Number of displayed results:

25 50 100 200

FUNCODE — source
Scripts for computing Functional Conservation of DNA Elements (FUNCODE) scores from ENCODE DNase-seq, ATAC-seq and Histone ChIP-seq data
Software
released
mex_gene_archive — source
mex_gene_archive is a minimal file format designed to meet the needs of archiving sparse gene matrices in a format compatible with the ENCODE 4 Data Coordination Center.
Software type: other
Software
released
CNDBTools — source
Used to generate the in-silico Hi-C map for each chromosome.
Software
released
OpenMiChrom — source
Used to create an ensemble of 3D structures with chromatin dynamics simulation software with input data from the Sequence Annotations (bed file) from PyMEGABASE.
Software
released
PyMEGABASE — source
PyMEGABASE is used to generate sequence annotations at the compartment and subcompartment level for physical modeling annotations.
Software
released
PROcapNet Model Zoo Pipeline — source
Software for BPNet models using PRO-cap data.
Software type: machine learning
Software
released
ProCapNet — source
Software for BPNet models using PRO-cap data.
Software type: machine learning
Software
released
TF ChIP-seq BPNet Model Zoo Pipeline — source
Placeholder description.
Software type: machine learning
Software
released
BPNet — source
BPNet is a python package with a CLI to train and interpret base-resolution deep neural networks trained on functional genomics data such as ChIP-nexus or ChIP-seq.
Software type: machine learning
Software
released
pyranges
GenomicRanges for Python.
Software
released
pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Software
released
Swan
Swan is a Python library designed for the analysis and visualization of transcriptomes.
Software type: other
Software
released
seqFISH+ — source
Pipeline to process seqFISH data.
Software type: quantification, other
Software
released
EPIraction — source
The EPIraction algorithm uses Tikhonov-regularized least squares models to predict the interacting promoter-enhancer pairs.
Software
released
GT_Scan — source
GT-Scan is a web-based tool that scans a user-defined genomic region for candidate targets and ranks them in terms of the number of exact or approximate off-targets in the genome.
Software
released
Cerberus
Cerberus software for long-read RNA-seq analysis
Software type: other
Software
released
LAPA — source
Alternative polyadenylation detection from diverse data sources such as 3'-seq, long-read and short-reads.
Software type: other
Software
released
GraphReg — source
GraphReg (Chromatin interaction aware gene regulatory modeling with graph attention networks) is a graph neural network based gene regulation model which integrates DNA sequence, 1D epigenomic data (such as chromatin accessibility and histone modifications), and 3D chromatin conformation data (such as Hi-C, HiChIP, Micro-C, HiCAR) to predict gene expression in an informative way.
Software
released
HiCDCPlus — source
The package HiCDCPlus provides methods to determine significant and differential chromatin interactions by use of a negative binomial generalized linear model, as well as implementations for TopDom to call topologically associating domains (TADs), and Juicer eigenvector to find the A/B compartments. This vignette explains the use of the package and demonstrates typical workflows on HiC and HiChIP data.
Software
released
Zerone
Zerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
Software
released
GEM-Tools
GEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
Software
released
Fastx Toolkit — source
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Software
released
TRACE
Transcription Factor Footprinting Using DNase I Hypersensitivity Data and DNA Sequence
Software
released
psf-to-bedpe — source
Quick script that converts psf to bedpe.
Software
released
3d-dna — source
We begin with a series of iterative steps whose goal is to eliminate misjoins in the input scaffolds. Each step begins with a scaffold pool (initially, this pool is the set of input scaffolds themselves). The scaffolding algorithm is used to order and orient these scaffolds. Next, the misjoin correction algorithm is applied to detect errors in the scaffold pool, thus creating an edited scaffold pool. Finally, the edited scaffold pool is used as an input for the next iteration of the misjoin correction algorithm. The ultimate effect of these iterations is to reliably detect misjoins in the input scaffolds without removing correctly assembled sequence. After this process is complete, the scaffolding algorithm is applied to the revised input scaffolds, and the output – a single “megascaffold” which concatenates all the chromosomes – is retained for post-processing.
Software
released
bioraddbg ATAC-seq MACS2 — source
This Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
Software
released
bioraddbg ATAC-seq filter beads — source
This Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
Software
released
bioraddbg ATAC-seq BWA — source
This Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
Software
released
bioraddbg ATAC-seq deconvolute — source
This Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
Software
released
guppy_basecaller — source
Ont-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
Software
released
MAGECK — source
Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology.
Software type: other
Software
released
RELICS — source
RELICS is an analysis method for discovering functional sequences from tiling CRISPR screens.
Software type: quantification
Software
released
polyAsite_workflow — source
Pipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
Software
released
gencode_utr_fix — source
This package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
Software
released
pyfaidx — source
This python module implements pure Python classes for indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
Software
released
seqkit — source
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
Software
released
STARsolo — source
STARsolo is a tool for mapping, demultiplexing, and quantification for single cell RNA-seq.
Software
released
HTSlib — source
A C library for reading/writing high-throughput sequencing data
Software
released
fastp — source
Tool for preprocessing fastq files
Software
released
DELTA — source
Tool to produce chromatin stripes, long range chromatin interactions, and topologically associated domains for Hi-C data.
Software type: other
Software
released
SLICE — source
Subcompartment Landscape Identification via Clustering Enrichments
Software type: other
Software
released
interpretation_samples — source
Interpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.
Software type: genome segmentation
Software
released
Scanpy — source
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing.
Software
released
LR-splitpipe — source
Demultiplexing and debarcoding tool designed for LR-Split-seq data.
Software
released
Seurat — source
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
Software
released
split-pipe — source
The Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
Software
released
dREG — source
Detecting Regulatory Elements using GRO-seq and PRO-seq
Software
released
bigWigMerge — source
This tool from kentUtils merges together multiple bigWigs into a single output
Software
released
PRINSEQ Lite — source
PRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
Software
released
sQTLseekeR — source
sQTLseekeR is a package to detect splicing QTLs (sQTLs), which are variants associated with change in the splicing pattern of a gene. In sQTLSeeker, splicing patterns are modeled by the relative expression of the transcripts of a gene. The most recent version of sQTLseekeR can be employed to detect genetic variant associated to any multivariate phenotype
Software type: variant annotation
Software
released
ggsashimi — source
a command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments. It uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. It is implemented in python, and internally generates R code for plotting.
Software type: visualization
Software
released
FORGE2 — source
FORGE2 identifies tissue- or cell type-specific signal by analysing a minimum set of 5 single nucleotide polymorphisms (SNPs) for overlap with epigenetic data peaks compared to matched background SNPs and provides both graphical and tabular outputs.
Software type: integrated analysis
Software
released
eFORGE — source
eFORGE identifies tissue or cell type-specific signal by analysing a minimum set of 5 differentially methylated positions (DMPs) for overlap with DNase I hypersensitive sites (DHSs) compared to matched background DMPs and provides both graphical and tabulated outputs.
Software type: integrated analysis
Software
released
GenomeStudio — source
Software developed by Illumina for analysis of microarray data.
Software type: other
Software
released
merge_bcs — source
This Jupyter notebook merges files of barcodes to create a pass list of barcodes in common between the input files.
Software
released
ptools_bin — source
A data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.
Software type: other
Software
released
SCREEN — source
SCREEN is a web-based visualizer for the ENCODE Registry of cCREs. Users can search for cCREs by genomic region or by associated features such as genes and SNPs, and can also visualize associated underlying annotations from the ground and integrative levels of the ENCODE Encyclopedia such as gene expression, TF ChIP-seq peaks, chromatin states, and cCRE-target gene links. Additionally, users can access ENCODE data on the functional characterization of cCREs.
Software
released
POSSUM — source
PCA Of Sparse, SUper Massive Matrices (POSSUM) contains R and C/C++ functions for very fast eigenvector calculation
Software
released
apricot — source
apricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data and for training accurate machine learning models with just a fraction of the examples and compute.
Software
released
Pairix — source
Pairix is a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates.
Software
released
IsoSeq3 — source
IsoSeq3, a tool in the SMRTanalysis software suite available from Pacific Biosciences, contains tools for identifying transcripts (detecting polyA tails and concatemers, read clustering, and deduplication).
Software
released
Lima — source
Lima, a tool in the SMRTanalysis software suite available from Pacific Biosciences, removes primers and demultiplexes barcodes.
Software
released
CCS — source
CCS (Circular Consensus), a tool in the SMRTanalysis software suite available from Pacific Biosciences, generates highly accurate single-molecule consensus reads.
Software
released
VALIS — source
High performance WebGL genome visualization
Software
released
Croo — source
Croo is a Python package for organizing outputs from Cromwell. Croo parses metadata.json which is an output from Cromwell and makes an organized directory with a copy (or a soft link) of each output file as described in an output definition JSON file specified by --out-def-json.
Software type: framework
Software
released
Caper — source
Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for Cromwell. Caper is based on Unix and cloud platform CLIs (curl, gsutil and aws) and provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Also, Caper supports easy automatic file transfer between local/cloud storages (local path, s3://, gs:// and http(s)://). You can use these URIs in input JSON file or for a WDL file itself.
Software type: framework
Software
released
encode_utils — source
Tools that are useful to any ENCODE Consortium submitting group, as well as the general community working with ENCODE data. Library and scripts are coded in Python.
Software
released
Check Files — source
Files are checked to see if the MD5 sum (both for gzipped and ungzipped) is identical to the submitted metadata, as well as run through the validateFiles program from Jim Kent's source utilities.
Software
released
SnoVault — source
The ENCODE DCC has created a general purpose software system, known as SnoVault, that supports metadata and file submission, a database used for metadata storage, web pages for displaying the metadata and a robust API for querying the metadata.
Software type: database
Software
released
encodeD — source
Metadata database for ENCODE project
Software type: database
Software
released
liftOver
This UCSC tool converts genome coordinates and genome annotation files between assemblies.
Software
released
csaw — source
Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
Software type: peak caller
Software
released
fgbio — source
A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
Software
released
eCLIP Repeat Family pipeline — source
Pipeline for mapping repetitive elements.
Software type: other
Software
released
eCLIP core pipeline — source
Custom software developed by Yeo lab for use in the eCLIP pipeline.
Software type: other
Software
released
fastq-tools — source
A collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
Software type: other
Software
released
UMI-tools — source
Tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
Software type: other
Software
released
FASTQ read-name correction
A script resolving FASTQ read-name inconsistencies
Software
released
Harmony — source
As described in our manuscript, “Fast, sensitive and accurate integration of single-cell data with Harmony”.
Software type: other
Software
released
Presto — source
Presto performs a fast Wilcoxon rank sum test and auROC analysis, as describred in our manuscript, “Presto scales Wilcoxon and auROC analyses to millions of observations”.
Software type: other
Software
released
Symphony — source
As described in our manuscript, “Efficient and precise single-cell reference atlas mapping with Symphony”.
Software type: other
Software
released
scPOST — source
Simulation of single-cell datasets for power analyses that estimate power to detect cell state frequency shifts between conditions (e.g. an expansion of a cell state in disease vs. healthy), as described in our manuscript “Maximizing statistical power to detect clinically associated cell states with scPOST”.
Software type: other
Software
released
cdr3-QTL — source
We tested associations between HLA genotypes and TCR-CDR3 amino acid compositions. We treated the amino acid composition of CDR3 as a quantitative trait, and tested its association with HLA genotype; we call this CDR3 quantitative trait loci analysis (cdr3-QTL), as described in our manuscript “HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors”.
Software type: other
Software
released
Imperio — source
This software includes (i) DeepBoost, a gradient boosting method for constructing boosted deep learning annotations by integrating deep learning allelic-effect annotations with fine-mapped SNPs; (ii) tools to combine these deep learning annotations with SNP-to-gene (S2G) linking strategies and relevant gene sets, and (iii) Imperio, a method for integrating deep learning annotations with S2G strategies to predict gene expression in whole blood and construct allelic-effect annotations based on changes in predicted expression. Applications of these 3 approaches to blood-related traits are described in our manuscript “Integrative approaches to improve the informativeness of deep learning models for human complex diseases”.
Software type: other
Software
released
GSSG — source
GSSG consists of tools to generate enhancer-driven and master-regulator gene scores in blood, and combine these gene scores with distal and proximal SNP-to-gene (S2G) linking strategies to construct SNP annotations for blood-related traits, as described in our manuscript “Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNIP-to-gene linking strategies”.
Software type: other
Software
released
S-LDXR — source
S-LDXR is a method for stratifying squared trans-ethnic genetic correlation across genomic annotations, as described in our manuscript “Population-specific causal disease effect sizes in functionally important regions impacted by selection”.
Software type: other
Software
released
AnnotBoost — source
AnnotBoost is a gradient boosting-based framework to impute and denoise Mendelian disease-derived pathogenicity scores to improve their informativeness for common disease, as described in our manuscript “Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease”.
Software type: variant annotation
Software
released
PolyFun — source
PolyFun is a method that leverages genome-wide functional annotations to improve the fine-mapping power, as described in our manuscript “Functionally-informed fine-mapping and polygenic localization of complex trait heritability”.
Software type: other
Software
released
ptools — source
A data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.
Software
released
ChromImpute — source
ChromImpute is software for large-scale systematic epigenome imputation. ChromImpute takes an existing compendium of epigenomic data and uses it to predict signal tracks for mark-sample combinations not experimentally mapped or to generate a potentially more robust version of data sets that have been mapped experimentally. ChromImpute bases its predictions on features from signal tracks of other marks that have been mapped in the target sample and the target mark in other samples with these features combined using an ensemble of regression trees.
Software
released
BBDUK — source
This tool from the BBMap package filters, trims, or masks reads with kmer matches to an artifact/contaminant file.
Software
released
Bru-seq Tools
The Ljungman lab scripts used to process Bru-seq, BruUV-seq, and BruChase data.
Software
released
Cell Ranger — source
Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis (mkfastq, count, aggr, and reanalyze).
Software
released
xsv — source
xsv is a command line program for indexing, slicing, analyzing, splitting and joining CSV files.
Software
released
bsseq — source
This R package is the reference implementation of the BSmooth algorithm for analyzing whole-genome bisulfite sequencing (WGBS) data.
Software
released
gemBS — source
gemBS is a high performance bioinformatic pipeline designed for highthroughput analysis of DNA methylation data from whole genome bisulfites sequencing data (WGBS). It combines GEM3, a high performance read aligner and bs_call, a high performance variant and methyation caller, into a streamlined and efficient pipeline for bisulfite sueqnce analysis.
Software
released
REDITs — source
REDITs contain a suite of tools to identify differential RNA editing sites using RNA-seq data
Software type: other
Software
released
mountainClimber — source
mountainClimber is a method for de novo identification of alternative transcript start sites and polyadenylation sites in RNA-seq data
Software type: transcript identification
Software
released
BEAPR — source
BEAPR is a method to identify allele-specific binding of RNA-binding proteins using eCLIP-seq data as described in our paper “Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA"
Software type: variant annotation
Software
released
WashU Epigenome Browser — source
The WashU Epigenome Browser provides visualization, integration and analysis tools for epigenomic datasets. Since 2010, it has provided the scientific community with data from large consortia including the Roadmap Epigenomics and the ENCODE projects. Browser features include: (i) visualization using virtual reality (VR), which has implications in biology education and the study of 3D chromatin structure; (ii) expanded public data hubs, including data from the 4DN, ENCODE, Roadmap Epigenomics, TaRGET, IHEC and TCGA consortia; (iii) a more responsive user interface; (iv) a history of interactions, which enables undo and redo; (v) a feature we call Live Browsing, which allows multiple users to collaborate remotely on the same session; (vi) the ability to visualize local tracks and data hubs. Amazon Web Services also hosts the browser at https://epigenomegateway.org/.
Software type: database, other
Software
released

ENCODE Software

Software type

Award

Lab

Showing 100 of 324 results