ENCODE Software

All software used or developed by the ENCODE Consortium

Showing 100 of 146 results

List Report

Number of displayed results:

25 50 100 200

pyranges
GenomicRanges for Python.
Software
released
pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Software
released
Zerone
Zerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
Software
released
GEM-Tools
GEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
Software
released
Fastx Toolkit — source
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Software
released
bioraddbg ATAC-seq MACS2 — source
This Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
Software
released
bioraddbg ATAC-seq filter beads — source
This Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
Software
released
bioraddbg ATAC-seq BWA — source
This Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
Software
released
bioraddbg ATAC-seq deconvolute — source
This Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
Software
released
guppy_basecaller — source
Ont-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
Software
released
MAGECK — source
Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology.
Software type: other
Software
released
RELICS — source
RELICS is an analysis method for discovering functional sequences from tiling CRISPR screens.
Software type: quantification
Software
released
polyAsite_workflow — source
Pipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
Software
released
gencode_utr_fix — source
This package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
Software
released
pyfaidx — source
This python module implements pure Python classes for indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
Software
released
seqkit — source
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
Software
released
STARsolo — source
STARsolo is a tool for mapping, demultiplexing, and quantification for single cell RNA-seq.
Software
released
HTSlib — source
A C library for reading/writing high-throughput sequencing data
Software
released
fastp — source
Tool for preprocessing fastq files
Software
released
interpretation_samples — source
Interpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.
Software type: genome segmentation
Software
released
Scanpy — source
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing.
Software
released
Seurat — source
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
Software
released
split-pipe — source
The Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
Software
released
dREG — source
Detecting Regulatory Elements using GRO-seq and PRO-seq
Software
released
bigWigMerge — source
This tool from kentUtils merges together multiple bigWigs into a single output
Software
released
PRINSEQ Lite — source
PRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
Software
released
Pairix — source
Pairix is a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates.
Software
released
IsoSeq3 — source
IsoSeq3, a tool in the SMRTanalysis software suite available from Pacific Biosciences, contains tools for identifying transcripts (detecting polyA tails and concatemers, read clustering, and deduplication).
Software
released
Lima — source
Lima, a tool in the SMRTanalysis software suite available from Pacific Biosciences, removes primers and demultiplexes barcodes.
Software
released
CCS — source
CCS (Circular Consensus), a tool in the SMRTanalysis software suite available from Pacific Biosciences, generates highly accurate single-molecule consensus reads.
Software
released
liftOver
This UCSC tool converts genome coordinates and genome annotation files between assemblies.
Software
released
csaw — source
Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
Software type: peak caller
Software
released
fgbio — source
A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
Software
released
fastq-tools — source
A collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
Software type: other
Software
released
UMI-tools — source
Tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
Software type: other
Software
released
BBDUK — source
This tool from the BBMap package filters, trims, or masks reads with kmer matches to an artifact/contaminant file.
Software
released
Cell Ranger — source
Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis (mkfastq, count, aggr, and reanalyze).
Software
released
bam2pairs — source
This script converts a paired-end bam file to a pairs file.
Software type: file format conversion
Software
released
CPU — source
ChIA-PET Utilities is a collection of efficient specialized programs for processing ChIA-PET data from raw reads to interactions.
Software
released
SURVIVOR — source
SURVIVOR is a tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.
Software
released
pbsv — source
pbsv is a suite of tools to call and analyze structural variants in diploid genomes from PacBio single molecule real-time sequencing (SMRT) reads. The tools power the Structural Variant Calling analysis workflow in PacBio's SMRT Link GUI. pbsv calls insertions, deletions, inversions, duplications, and translocations. Both single-sample calling and joint (multi-sample) calling are provided.
Software
released
Sniffles — source
Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
Software
released
NGMLR — source
NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.
Software
released
HapCUT2 — source
HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads.
Software
released
freebayes — source
freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
Software
released
Pysam
Python module warapping htslib C-API and samtools for accessing sam formatted alignment files
Software type: other
Software
released
MATS — source
MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold. From the RNA-Seq data, MATS can automatically detect and analyze alternative splicing events corresponding to all major types of alternative splicing patterns. MATS handles replicate RNA-Seq data from both paired and unpaired study design.
Software
released
Bowtie 2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Software
released
bigWigToWig — source
The binary bigWig format can be converted to the text based wig or bedGraph formats using this utility.
Software type: file format conversion
Software
released
PyLiftover — source
PyLiftover is a library for quick and easy conversion of genomic (point) coordinates between different assemblies. It uses the same logic and coordinate conversion mappings as the UCSC liftOver tool.
Software type: other
Software
released
trim-adapters-illumina — source
This program will trim adapters from pair-end sequencing tags produced using the Illumina(c) platform.
Software type: filtering
Software
released
caTools — source
Several basic utility functions including: moving (rolling, running) window statistic functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
Software
released
edwBamFilter — source
Remove reads from a BAM file based on a number of criteria
Software type: filtering
Software
released
edwBamStats — source
Collect some basic characterization statistics of a BAM file.
Software type: quality metric
Software
released
Cufflinks — source
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
Software
released
GATK — source
The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Software type: variant annotation
Software
released
SeparateReadpairs — source
Splits up the interleaved file into two valid paired fastqs.
Software type: other
Software
released
NextGenMap — source
NextGenMap is a flexible and fast read mapping program that is more than twice as fast as BWA while achieving a mapping sensitivity similar to Stampy.
Software type: aligner
Software
released
Scalpel — source
Scalpel is a software package for detecting INDELs (INsertions and DELetions) mutations in a reference genome which has been sequenced with next-generation sequencing technology (e.g., Illumina).
Software type: variant annotation
Software
released
BamTools — source
C++ API & command-line toolkit for working with BAM data
Software
released
Subread — source
High-performance read alignment, quantification and mutation discovery
Software
released
Trimmomatic — source
A flexible read trimming tool for Illumina NGS data
Software
released
deepTools — source
Tools to process and analyze deep sequencing data.
Software
released
gnuplot — source
Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gnuplot has been supported and under active development since 1986.
Software type: other
Software
released
Preseq — source
From the Smith lab: "The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment."
Software type: other
Software
released
CpG methylation correlation — source
Calculates spearman correlation of 2 replicate bedmethyl files of CpG methylation.
Software type: quality metric
Software
released
cutadapt — source
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Software type: other
Software
released
Trim Galore — source
Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
Software type: filtering
Software
released
permseq — source
An R package that performs multi-read mapping of ChIP-seq datasets. permseq works with bowtie and takes as input fastq files . It can work with just ChIP-seq data or, when other complementary data such as DNase-seq, histone ChIP-seq are available, it can utilize these data sources as prior information for multi-read mapping. The output from permseq is a text file of aligned reads available in bed, tagAlign, or bam formats.
Software
released
mosaics — source
An R package for TF and histone ChIP-seq analysis. mosaics takes as input the aligned files. It provides diagnostics plots for evaluating how well the mosaics model fits and allows FDR control. The mosaics-hmm module provides boundary adjusted broad peak calls. The output from mosaics is a set of peaks in a number of formats including bed. mosaics also generates intermediate data files/objects such as genome-wide read counts at the bin level for specified bin sizes, wig files for visualizing on the browser.
Software
released
atSNP — source
An R package for screening SNPs for their potential to enhance or disrupt transcription factor binding sites. atSNP accepts as input either SNP ids or the actual coordinates of the SNPs and the alternative alleles. It uses ENCODE motifs and JASPAR motifs to evaluate the regulatory potential of the SNPs; however, it also allows user specified set of transcription factor binding sites in the form of position specific matrices. It outputs for each SNP the significance of the match to each position specific matrix with both the reference and the alternative allele and also the significance of the change in these match scores. atSNP also provides easy visualization of the SNP impact on the binding site by composite logo plots.
Software
released
wigToBigWig — source
The bigWig format is for display of dense, continuous data that will be displayed in the Genome Browser as a graph. BigWig files are created initially from wiggle (wig) type files, using the program wigToBigWig. The resulting bigWig files are in an indexed binary format. The main advantage of the bigWig files is that only the portions of the files needed to display a particular region are transferred to UCSC, so for large data sets bigWig is considerably faster than regular wiggle files.
Software type: file format conversion
Software
released
R — source
R is a free software environment for statistical computing and graphics
Software
released
Median Absolute Deviation — source
Calculates the Median Absolute Deviation (MAD) and correlation of two gene quantifications from replicate RNA-seq experiments. A measure of reproducibility, inversely correlated with data quality.
Software type: quality metric
Software
released
Tophat BAM Repair — source
tophat_bam_xsA_tag_fix.pl was written by x wei to allow the use of tophat 2.0.8 in the ENCODE pipelines. It reads a bam file generated from paired-ended fastqs by tophat 2.0.8 and corrects the XS:A:+ or XS:A:- tags showing read strandedness.
Software type: other
Software
released
Concat-fastq — source
Concat-fastqs is an applet available for DNA-nexus to concatenate a set of fastqs that should be merged for analysis.
Software type: other
Software
released
bedToBigBed — source
bedToBigBed takes a standard bed file or a non-standard bed file with associated .as file to create a compressed bigBed version. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891.
Software type: file format conversion
Software
released
Mott Trimmer — source
Mott-Trimmer for WGBS reads
Software
released
Makepseudoreplicates
Generate psuedoreplicates for self-consistency tests.
Software type: other
Software
released
WGBS output processor — source
Convert a Bismark CX_report file to bed-like files
Software type: other
Software
released
Samtools — source
Samtools is a suite of programs for interacting with high-throughput sequencing data. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments (PMID:19505943).
Software type: other
Software
released
Picard — source
A set of tools (in Java) for working with next generation high-throughput sequencing (HTS) data in the BAM format. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data. Currenty no published paper for Picard software.
Software type: filtering, other
Software
released
Bismark — source
A tool to map bisulfite converted sequence reads and determine cytosine methylation states. The output produced by Bismark discriminates between cytosines in CpG, CHG and CHH context and enables bench scientists to visualize and interpret their methylation data soon after the sequencing run is completed (PMID: 21493656).
Software type: other
Software
released
RSEM — source
A software package for estimating gene and isoform expression levels from RNA-Seq data. RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Software type: transcript identification
Software
released
TopHat — source
Fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Software type: aligner
Software
released
Flux Capacitor — source
The exonic structure of two spliceforms. FluxCapacitor recontructs abundances of known transcript forms from RNAseq data (PMCID: PMC3836232).
Software type: transcript identification
Software
released
BWA — source
BWA is a software package for mapping low-divergent sequences based on a Burrows-Wheeler index against a large reference genome, such as the human genome. Publications for the short read alignment component is found at PMID: 19451168, while PMID: 20080505 outlines the algorithm to align sequences >200bp up to 1Mb.
Software type: aligner
Software
released
npIDR — source
Non-parametric Irreproducibe Detection Rate (npIDR) essentially takes a pooled sample of all replicas and computes (a) the frequency of seeing count=x; (b) the frequency of seeing count=x given that in *ALL* other replicas the count is equal to zero. Original Irreproducible detection rate statistical test published (DOI: 10.1214/11-AOAS466).
Software type: quality metric
Software
released
FastQC — source
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. Babraham Bioinformatics Web site, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Software type: quality metric
Software
released
CLIPper — source
A tool to detect CLIP-seq peaks. The Yeo lab describes their CLIP-seq cluster-identification algorithm on PMID: 24213538
Software type: quality metric, peak caller
Software
released
bedGraphToBigWig — source
Convert bedGraph to bigWig file. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891
Software type: file format conversion
Software
released
WASP — source
WASP is a software package for two related tasks: (1) correcting allelic bias in mapped sequencing reads and, (2) identifying molecular quantitative trait loci (QTLs) using next-generation sequencing data (e.g. gene expression QTLs or histone mark QTLs). The WASP mapper works with any read mapping pipeline that outputs BAM or SAM format. WASP identifies molecular QTLs using a statistical test that combines information about the total depth and allelic imbalance of mapped reads. WASP can call QTLs with very small sample sizes (as few as 10) compared to traditional QTL mapping approaches.
Software type: aligner, variant annotation
Software
released
Webgestalt — source
WebGestalt is a "WEB-based GEne SeT AnaLysis Toolkit". It is designed for functional genomic, proteomic and large-scale genetic studies from which a large number of gene lists (e.g., differentially expressed gene sets, co-expressed gene sets, etc.) are continuously generated. WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of gene lists.
Software
released
RuleFit3 — source
RuleFit3 is a predictive learning method and interpretational tool. It is based on general regression and classification models, which are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables (doi:10.1214/07-AOAS148).
Software
released
Peppy — source
Peppy is software that integrates several critical tasks of proteogenomic searching and proteogenomic mapping such as: Full 6-frame translation and digestion of a genome, peptide/spectrum matching and quality assessment, and calculation of false discovery rates (PMID: 23614390).
Software
released
Mfinder — source
mfinder is a software tool for network motifs detection. Network motifs are defined as basic interaction patterns that recur throughout biological networks, much more often than in random networks. In order to detect network motifs mfinder implements two methods: a full enumeration of subgraphs and a sampling of subgraphs for estimation of subgraph concentrations. mfinder generates random networks based on the switching method, the stubs method and "Go with the winners" algorithm.
Software
released
lumi package — source
The lumi package in R provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.
Software
released
King — source
KING is a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of an unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). KING performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. KING performs relationship inference on millions of pairs of individuals in the order of minutes.
Software
released
Java Treeview — source
Java Treeview is an open source, cross-platform gene expression visualization tool and an interactive display of clustered gene expression data, similar to Eisen's treeview. It is also an extensible starting point for other gene expression visualization tools.
Software
released
HiveR — source
The hive plot is a visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes. Edges are drawn as curved links. Hive plots can give quantitatively understanding for important aspects of a network's structure. Hive plots can also manage the visual complexity arising from a large number of edges and expose both trends and outlier patterns in a network structure.
Software
released

ENCODE Software

Software type

Lab

Showing 100 of 146 results