ENCODE Software

All software used or developed by the ENCODE Consortium

Showing 146 of 146 results

List Report

Number of displayed results:

25 50 100 200

pyranges
GenomicRanges for Python.
Software
released
pandas
Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
Software
released
Zerone
Zerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
Software
released
GEM-Tools
GEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
Software
released
Fastx Toolkit — source
The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
Software
released
bioraddbg ATAC-seq MACS2 — source
This Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
Software
released
bioraddbg ATAC-seq filter beads — source
This Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
Software
released
bioraddbg ATAC-seq BWA — source
This Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
Software
released
bioraddbg ATAC-seq deconvolute — source
This Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
Software
released
guppy_basecaller — source
Ont-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
Software
released
MAGECK — source
Model-based Analysis of Genome-wide CRISPR-Cas9 Knockout (MAGeCK) is a computational tool to identify important genes from the recent genome-scale CRISPR-Cas9 knockout screens technology.
Software type: other
Software
released
RELICS — source
RELICS is an analysis method for discovering functional sequences from tiling CRISPR screens.
Software type: quantification
Software
released
polyAsite_workflow — source
Pipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
Software
released
gencode_utr_fix — source
This package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
Software
released
pyfaidx — source
This python module implements pure Python classes for indexing, retrieval, and in-place modification of FASTA files using a samtools compatible index.
Software
released
seqkit — source
A cross-platform and ultrafast toolkit for FASTA/Q file manipulation in Golang
Software
released
STARsolo — source
STARsolo is a tool for mapping, demultiplexing, and quantification for single cell RNA-seq.
Software
released
HTSlib — source
A C library for reading/writing high-throughput sequencing data
Software
released
fastp — source
Tool for preprocessing fastq files
Software
released
interpretation_samples — source
Interpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.
Software type: genome segmentation
Software
released
Scanpy — source
Scanpy is a scalable toolkit for analyzing single-cell gene expression data built jointly with anndata. It includes preprocessing, visualization, clustering, trajectory inference and differential expression testing.
Software
released
Seurat — source
Seurat is an R package designed for QC, analysis, and exploration of single-cell RNA-seq data.
Software
released
split-pipe — source
The Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
Software
released
dREG — source
Detecting Regulatory Elements using GRO-seq and PRO-seq
Software
released
bigWigMerge — source
This tool from kentUtils merges together multiple bigWigs into a single output
Software
released
PRINSEQ Lite — source
PRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
Software
released
Pairix — source
Pairix is a tool for indexing and querying on a block-compressed text file containing pairs of genomic coordinates.
Software
released
IsoSeq3 — source
IsoSeq3, a tool in the SMRTanalysis software suite available from Pacific Biosciences, contains tools for identifying transcripts (detecting polyA tails and concatemers, read clustering, and deduplication).
Software
released
Lima — source
Lima, a tool in the SMRTanalysis software suite available from Pacific Biosciences, removes primers and demultiplexes barcodes.
Software
released
CCS — source
CCS (Circular Consensus), a tool in the SMRTanalysis software suite available from Pacific Biosciences, generates highly accurate single-molecule consensus reads.
Software
released
liftOver
This UCSC tool converts genome coordinates and genome annotation files between assemblies.
Software
released
csaw — source
Detection of differentially bound regions in ChIP-seq data with sliding windows, with methods for normalization and proper FDR control.
Software type: peak caller
Software
released
fgbio — source
A set of tools to analyze genomic data with a focus on Next Generation Sequencing.
Software
released
fastq-tools — source
A collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.
Software type: other
Software
released
UMI-tools — source
Tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.
Software type: other
Software
released
BBDUK — source
This tool from the BBMap package filters, trims, or masks reads with kmer matches to an artifact/contaminant file.
Software
released
Cell Ranger — source
Cell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis (mkfastq, count, aggr, and reanalyze).
Software
released
bam2pairs — source
This script converts a paired-end bam file to a pairs file.
Software type: file format conversion
Software
released
CPU — source
ChIA-PET Utilities is a collection of efficient specialized programs for processing ChIA-PET data from raw reads to interactions.
Software
released
SURVIVOR — source
SURVIVOR is a tool set for simulating/evaluating SVs, merging and comparing SVs within and among samples, and includes various methods to reformat or summarize SVs.
Software
released
pbsv — source
pbsv is a suite of tools to call and analyze structural variants in diploid genomes from PacBio single molecule real-time sequencing (SMRT) reads. The tools power the Structural Variant Calling analysis workflow in PacBio's SMRT Link GUI. pbsv calls insertions, deletions, inversions, duplications, and translocations. Both single-sample calling and joint (multi-sample) calling are provided.
Software
released
Sniffles — source
Sniffles is a structural variation caller using third generation sequencing (PacBio or Oxford Nanopore). It detects all types of SVs (10bp+) using evidence from split-read alignments, high-mismatch regions, and coverage analysis.
Software
released
NGMLR — source
NGMLR is a long-read mapper designed to align PacBio or Oxford Nanopore (standard and ultra-long) to a reference genome with a focus on reads that span structural variations.
Software
released
HapCUT2 — source
HapCUT2 is a maximum-likelihood-based tool for assembling haplotypes from DNA sequence reads.
Software
released
freebayes — source
freebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
Software
released
Pysam
Python module warapping htslib C-API and samtools for accessing sam formatted alignment files
Software type: other
Software
released
MATS — source
MATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold. From the RNA-Seq data, MATS can automatically detect and analyze alternative splicing events corresponding to all major types of alternative splicing patterns. MATS handles replicate RNA-Seq data from both paired and unpaired study design.
Software
released
Bowtie 2
Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
Software
released
bigWigToWig — source
The binary bigWig format can be converted to the text based wig or bedGraph formats using this utility.
Software type: file format conversion
Software
released
PyLiftover — source
PyLiftover is a library for quick and easy conversion of genomic (point) coordinates between different assemblies. It uses the same logic and coordinate conversion mappings as the UCSC liftOver tool.
Software type: other
Software
released
trim-adapters-illumina — source
This program will trim adapters from pair-end sequencing tags produced using the Illumina(c) platform.
Software type: filtering
Software
released
caTools — source
Several basic utility functions including: moving (rolling, running) window statistic functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
Software
released
edwBamFilter — source
Remove reads from a BAM file based on a number of criteria
Software type: filtering
Software
released
edwBamStats — source
Collect some basic characterization statistics of a BAM file.
Software type: quality metric
Software
released
Cufflinks — source
Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples.
Software
released
GATK — source
The Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.
Software type: variant annotation
Software
released
SeparateReadpairs — source
Splits up the interleaved file into two valid paired fastqs.
Software type: other
Software
released
NextGenMap — source
NextGenMap is a flexible and fast read mapping program that is more than twice as fast as BWA while achieving a mapping sensitivity similar to Stampy.
Software type: aligner
Software
released
Scalpel — source
Scalpel is a software package for detecting INDELs (INsertions and DELetions) mutations in a reference genome which has been sequenced with next-generation sequencing technology (e.g., Illumina).
Software type: variant annotation
Software
released
BamTools — source
C++ API & command-line toolkit for working with BAM data
Software
released
Subread — source
High-performance read alignment, quantification and mutation discovery
Software
released
Trimmomatic — source
A flexible read trimming tool for Illumina NGS data
Software
released
deepTools — source
Tools to process and analyze deep sequencing data.
Software
released
gnuplot — source
Gnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gnuplot has been supported and under active development since 1986.
Software type: other
Software
released
Preseq — source
From the Smith lab: "The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment."
Software type: other
Software
released
CpG methylation correlation — source
Calculates spearman correlation of 2 replicate bedmethyl files of CpG methylation.
Software type: quality metric
Software
released
cutadapt — source
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Software type: other
Software
released
Trim Galore — source
Trim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).
Software type: filtering
Software
released
permseq — source
An R package that performs multi-read mapping of ChIP-seq datasets. permseq works with bowtie and takes as input fastq files . It can work with just ChIP-seq data or, when other complementary data such as DNase-seq, histone ChIP-seq are available, it can utilize these data sources as prior information for multi-read mapping. The output from permseq is a text file of aligned reads available in bed, tagAlign, or bam formats.
Software
released
mosaics — source
An R package for TF and histone ChIP-seq analysis. mosaics takes as input the aligned files. It provides diagnostics plots for evaluating how well the mosaics model fits and allows FDR control. The mosaics-hmm module provides boundary adjusted broad peak calls. The output from mosaics is a set of peaks in a number of formats including bed. mosaics also generates intermediate data files/objects such as genome-wide read counts at the bin level for specified bin sizes, wig files for visualizing on the browser.
Software
released
atSNP — source
An R package for screening SNPs for their potential to enhance or disrupt transcription factor binding sites. atSNP accepts as input either SNP ids or the actual coordinates of the SNPs and the alternative alleles. It uses ENCODE motifs and JASPAR motifs to evaluate the regulatory potential of the SNPs; however, it also allows user specified set of transcription factor binding sites in the form of position specific matrices. It outputs for each SNP the significance of the match to each position specific matrix with both the reference and the alternative allele and also the significance of the change in these match scores. atSNP also provides easy visualization of the SNP impact on the binding site by composite logo plots.
Software
released
wigToBigWig — source
The bigWig format is for display of dense, continuous data that will be displayed in the Genome Browser as a graph. BigWig files are created initially from wiggle (wig) type files, using the program wigToBigWig. The resulting bigWig files are in an indexed binary format. The main advantage of the bigWig files is that only the portions of the files needed to display a particular region are transferred to UCSC, so for large data sets bigWig is considerably faster than regular wiggle files.
Software type: file format conversion
Software
released
R — source
R is a free software environment for statistical computing and graphics
Software
released
Median Absolute Deviation — source
Calculates the Median Absolute Deviation (MAD) and correlation of two gene quantifications from replicate RNA-seq experiments. A measure of reproducibility, inversely correlated with data quality.
Software type: quality metric
Software
released
Tophat BAM Repair — source
tophat_bam_xsA_tag_fix.pl was written by x wei to allow the use of tophat 2.0.8 in the ENCODE pipelines. It reads a bam file generated from paired-ended fastqs by tophat 2.0.8 and corrects the XS:A:+ or XS:A:- tags showing read strandedness.
Software type: other
Software
released
Concat-fastq — source
Concat-fastqs is an applet available for DNA-nexus to concatenate a set of fastqs that should be merged for analysis.
Software type: other
Software
released
bedToBigBed — source
bedToBigBed takes a standard bed file or a non-standard bed file with associated .as file to create a compressed bigBed version. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891.
Software type: file format conversion
Software
released
Mott Trimmer — source
Mott-Trimmer for WGBS reads
Software
released
Makepseudoreplicates
Generate psuedoreplicates for self-consistency tests.
Software type: other
Software
released
WGBS output processor — source
Convert a Bismark CX_report file to bed-like files
Software type: other
Software
released
Samtools — source
Samtools is a suite of programs for interacting with high-throughput sequencing data. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments (PMID:19505943).
Software type: other
Software
released
Picard — source
A set of tools (in Java) for working with next generation high-throughput sequencing (HTS) data in the BAM format. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data. Currenty no published paper for Picard software.
Software type: filtering, other
Software
released
Bismark — source
A tool to map bisulfite converted sequence reads and determine cytosine methylation states. The output produced by Bismark discriminates between cytosines in CpG, CHG and CHH context and enables bench scientists to visualize and interpret their methylation data soon after the sequencing run is completed (PMID: 21493656).
Software type: other
Software
released
RSEM — source
A software package for estimating gene and isoform expression levels from RNA-Seq data. RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
Software type: transcript identification
Software
released
TopHat — source
Fast splice junction mapper for RNA-Seq reads. It aligns RNA-Seq reads to mammalian-sized genomes using the ultra high-throughput short read aligner Bowtie, and then analyzes the mapping results to identify splice junctions between exons.
Software type: aligner
Software
released
Flux Capacitor — source
The exonic structure of two spliceforms. FluxCapacitor recontructs abundances of known transcript forms from RNAseq data (PMCID: PMC3836232).
Software type: transcript identification
Software
released
BWA — source
BWA is a software package for mapping low-divergent sequences based on a Burrows-Wheeler index against a large reference genome, such as the human genome. Publications for the short read alignment component is found at PMID: 19451168, while PMID: 20080505 outlines the algorithm to align sequences >200bp up to 1Mb.
Software type: aligner
Software
released
npIDR — source
Non-parametric Irreproducibe Detection Rate (npIDR) essentially takes a pooled sample of all replicas and computes (a) the frequency of seeing count=x; (b) the frequency of seeing count=x given that in *ALL* other replicas the count is equal to zero. Original Irreproducible detection rate statistical test published (DOI: 10.1214/11-AOAS466).
Software type: quality metric
Software
released
FastQC — source
FastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. Babraham Bioinformatics Web site, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Software type: quality metric
Software
released
CLIPper — source
A tool to detect CLIP-seq peaks. The Yeo lab describes their CLIP-seq cluster-identification algorithm on PMID: 24213538
Software type: quality metric, peak caller
Software
released
bedGraphToBigWig — source
Convert bedGraph to bigWig file. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891
Software type: file format conversion
Software
released
WASP — source
WASP is a software package for two related tasks: (1) correcting allelic bias in mapped sequencing reads and, (2) identifying molecular quantitative trait loci (QTLs) using next-generation sequencing data (e.g. gene expression QTLs or histone mark QTLs). The WASP mapper works with any read mapping pipeline that outputs BAM or SAM format. WASP identifies molecular QTLs using a statistical test that combines information about the total depth and allelic imbalance of mapped reads. WASP can call QTLs with very small sample sizes (as few as 10) compared to traditional QTL mapping approaches.
Software type: aligner, variant annotation
Software
released
Webgestalt — source
WebGestalt is a "WEB-based GEne SeT AnaLysis Toolkit". It is designed for functional genomic, proteomic and large-scale genetic studies from which a large number of gene lists (e.g., differentially expressed gene sets, co-expressed gene sets, etc.) are continuously generated. WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of gene lists.
Software
released
RuleFit3 — source
RuleFit3 is a predictive learning method and interpretational tool. It is based on general regression and classification models, which are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables (doi:10.1214/07-AOAS148).
Software
released
Peppy — source
Peppy is software that integrates several critical tasks of proteogenomic searching and proteogenomic mapping such as: Full 6-frame translation and digestion of a genome, peptide/spectrum matching and quality assessment, and calculation of false discovery rates (PMID: 23614390).
Software
released
Mfinder — source
mfinder is a software tool for network motifs detection. Network motifs are defined as basic interaction patterns that recur throughout biological networks, much more often than in random networks. In order to detect network motifs mfinder implements two methods: a full enumeration of subgraphs and a sampling of subgraphs for estimation of subgraph concentrations. mfinder generates random networks based on the switching method, the stubs method and "Go with the winners" algorithm.
Software
released
lumi package — source
The lumi package in R provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.
Software
released
King — source
KING is a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of an unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). KING performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. KING performs relationship inference on millions of pairs of individuals in the order of minutes.
Software
released
Java Treeview — source
Java Treeview is an open source, cross-platform gene expression visualization tool and an interactive display of clustered gene expression data, similar to Eisen's treeview. It is also an extensible starting point for other gene expression visualization tools.
Software
released
HiveR — source
The hive plot is a visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes. Edges are drawn as curved links. Hive plots can give quantitatively understanding for important aspects of a network's structure. Hive plots can also manage the visual complexity arising from a large number of edges and expose both trends and outlier patterns in a network structure.
Software
released
GSC (Genome Structure Correction)
Assessing the significance of observations within large scale genomic studies using random subsampled genomic region is a difficult problem because there often exists a complex dependency structure between observations. GSC is a data subsampling approach based on a block stationary model for genomic features to alleviate the hidden dependencies. This model is motivated by earlier studies of DNA sequences, which show that there are global shifts in base composition, but that certain sequence characteristics are locally unchanging.
Software
released
GREAT — source
GREAT assigns biological meaning to a set of non-coding genomic regions by analyzing the annotations of the nearby genes. Thus, it is particularly useful in studying cis functions of sets of non-coding genomic regions. Cis-regulatory regions can be identified via both experimental methods (e.g., ChIP-seq) and by computational methods (e.g. comparative genomics).
Software
released
GOstats — source
GOstats is a set of tools implemented in R Bioconductor for interacting with GO and microarray data. It provides a variety of basic manipulation tools for graphs, hypothesis testing including hypergeometric tests, and visualization tools.
Software
released
GOrilla — source
GOrilla is a web-based application that identifies enriched GO terms in ranked lists of genes, without requiring the user to provide explicit target and background sets. It also employs a flexible threshold statistical approach to discover GO terms that are significantly enriched at the top of a ranked gene list. Building on a complete theoretical characterization of the underlying distribution, GOrilla computes an exact p-value for the observed enrichment, taking threshold multiple testing into account without the need for simulations. The output of the enrichment analysis is visualized as a hierarchical structure, providing a clear view of the relations between enriched GO terms.
Software
released
GFS — source
GFS is a program that maps peptide mass fingerprint data directly to raw genomic sequence, enabling rapid low-cost identification of proteins in genomes for which annotation is lacking. An experimentally obtained peptide mass fingerprint is entered into the program, which then scans a genome sequence of interest and outputs the most likely regions of the genome from which the mass fingerprint is derived.
Software
released
GERP — source
GERP identifies constrained elements in multiple alignments by quantifying substitution deficits. These deficits represent substitutions that would have occurred if the element were neutral DNA, but did not occur because the element has been under functional constraint. These deficits, or rejected substitutions, are a natural measure of constraint that reflects the strength of past purifying selection on the element. GERP estimates constraint for each alignment column; elements are identified as excess aggregations of constrained columns. A false-positive rate (which is user-settable) is calculated using 'shuffled' alignments in which the order of columns is randomized.
Software
released
F-seq — source
F-seq is a software package that generates a continuous density estimation of sequence tags mapped to a reference genome, which can be displayed using the UCSC Genome Browser. The continuous density plots are more intuitive than discrete histogram-like plots used by some applications. Using kernel density estimation, F-seq can aid the identification of biologically meaningful sites.
Software type: peak caller
Software
released
FANMOD — source
FANMOD is a tool for fast network motif detection. It relies on recently developed algorithms to improve the efficiency of network motif detection by orders of magnitude. This facilitates the detection of larger motifs in bigger networks than previously possible. Additional benefits of FANMOD are the ability to analyze colored networks, a graphical user interface and the ability to export results to a variety of machine-readable and human-readable file formats, including comma-separated values and HTML.
Software
released
DAVID — source
DAVID is able to extract biological features and meanings associated with large gene lists. DAVID is able to handle any type of gene list, no matter which genomic platform or software package generated them. DAVID systematically maps a large number of interesting genes in a list to the associated biological annotation (e.g., gene ontology terms), and then statistically highlights the most overrepresented (enriched) biological annotation out of thousands of linked terms and contents.
Software
released
Cluster 3.0 — source
Cluster 3.0 is an implementation of k-means clustering, hierarchical clustering and self-organizing maps in a single multi-purpose open-source library of C routines, callable from other C and C++ programs. This library is an improved version of Michael Eisen's well-known Cluster program for Windows, Mac OS X and Linux/Unix. Additionally a Python and a Perl interface to the C Clustering Library is implemented to combine the flexibility of a scripting language with the speed of C.
Software
released
Circos — source
Circos is a software package for visualizing data and information. It visualizes data in a circular layout for exploring relationships between objects or positions. Circos creates publication-quality infographics and illustrations with a high data-to-ink ratio, layered data and symmetries.
Software
released
Bowtie — source
Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small: typically about 2.2 GB for the human genome (2.9 GB for paired-end).
Software
released
BEDTools — source
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetics: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, and VCF.
Software type: file format conversion
Software
released
ANNOVAR — source
ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others). Given a list of variants with chromosome, start position, end position, reference nucleotide and observed nucleotides, ANNOVAR can perform: (i) Gene-based annotation: identify whether SNPs or CNVs cause protein coding changes and the amino acids that are affected. (ii) Region-based annotations: identify variants in specific genomic regions, for example, conserved regions among 44 species, predicted transcription factor binding sites, segmental duplication regions, GWAS hits, database of genomic variants, DNAse I hypersensitivity sites, ENCODE H3K4Me1/H3K4Me3/H3K27Ac/CTCF sites, ChIP-Seq peaks, RNA-Seq peaks, or many other annotations on genomic intervals. (iii) Filter-based annotation: identify variants that are reported in dbSNP, identify the subset of common SNPs (MAF>1%) in the 1000 Genome Project, identify subset of non-synonymous SNPs with SIFT score>0.05, find intergenic variants with GERP++ score>2, or many other annotations on specific mutations.
Software
released
Regulatory Elements Database — source
Using an intuitive interface, you can 1) identify DNaseI-hypersensitive sites (DHS) within a genomic region of interest, 2) predict the target gene for DHS of interest, 3) predict the DHS that regulate a gene of interest, 4) identify clusters of similarly regulated DHS, that may have related function, 5) identify enriched motifs for transcription factors that may bind in these similarly regulated DHS, and 6) identify DHS that contain a DNA sequence motif for a transcription factor of interest. The Regulatory Elements Database provides access to roughly 2.8 million DNaseI-hypersensitive sites and their signal in 112 human samples, as well as Affymetrix microarray expression data for the same cell-types.
Software type: database
Software
released
Spark — source
Spark is an interactive pattern discovery and visualization approach designed with epigenomic data in mind. Spark can reveal both known and novel epigentic signatures.
Software type: database
Software
released
ENCODE-motifs — source
A database that uncovers the molecular basis of TF binding in the human genome based on regulatory motif analysis of all Transcription Factors (TFs) grouped by family. This allows browsing of all known motifs for each factor, curated from TRANSFAC, Jaspar, and Protein Binding Microarray (PBM) experiments, and their enrichment and instances within corresponding TF binding experiments. It also provides a list of novel regulatory motifs discovered by systematic application of several motif discovery tools (including MEME, MDscan, Weeder, AlignACE) and evaluated based on their enrichment relative to control motifs within TF-bound regions. ENCODE-motifs also provides a genome-wide map of regulatory motif instances in the human genome for both known and novel motifs.
Software type: database
Software
released
Factorbook — source
Factorbook is a transcription factor (TF)-centric web-based repository of integrative analysis associated with ENCODE ChIP-seq data. It includes de novo discovered motifs, chromatin features surrounding ChIP-seq peaks (histone modification patterns, DNase I cleavage footprints, and nucleosome positioning profiles), deep-learned models of sequence features driving TF binding, and integration with GWAS variants and the ENCODE Registry of candidate cis-regulatory elements.
Software type: database
Software
released
PIQ: Protein Interaction Quantification — source
PIQ is a computational method that models the magnitude and shape of genome-wide DNase profiles to facilitate the identification of transcription factor (TF) binding sites. The input of PIQ is one or more DNase-seq experiments, the genome sequence of the organism assayed and a list of motifs represented as position weight matrices (PWMs) that describe candidate TF binding sites. PIQ uses machine learning methods to normalize input DNase-seq data and then predicts TF binding by detecting both the shape and magnitude of DNase profiles specific to each TF. The output of PIQ is the probability of occupancy for each candidate binding site in the genome, along with aggregate TF-specific scores (e.g. metrics for TF-specific chromatin opening).
Software type: database
Software
released
RegulomeDB — source
Identifies DNA features and regulatory elements in non-coding regions of the human genome. One can enter dbSNP IDs, BED files, VCF files, or GFF3 files. A score is returned assessing the evidence for regulatory potential. Clicking on the score reveals the data supporting the inference, by data type and cell type. One can also click on hyperlinks to see the SNP or the region in the UCSC browser, ENSEMBL browser, and dbSNP.
Software type: database, variant annotation
Software
released
HaploReg — source
Explores annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Under Set Options tab, set Browse ENCODE button to "on" and select an LD threshold and reference population. Under Build Query Tab, enter a SNP (rsXXXXX), a set of SNPs, a genomic region, or select a GWAS from the drop down menu. HaploReg returns SNPs in LD with query SNPs, their frequency in 4 populations from 1000 Genomes Phase1, and also tells you what evidence ENCODE has found for regulatory protein binding (mouse over to see the protein names), chromatin structure (mouse over to see the cell types with DNase hypersensitivity), the chromatin state of the region (the chromatin state can predict an enhancer or promoter), and putative transcription factor binding motifs that are altered by the variant. Clicking on the SNP name hyperlink reveals further details, including cell type metadata and the mechanism of disruption/creation of TF binding regulatory motifs (showing the PWM matched and its alignment to the local sequence context). SNPs are also intersected with cross-species conserved elements, chromatin states from the Roadmap Epigenomics Consortium, and lead eQTLs from the GTEx Project browser.
Software type: database, variant annotation
Software
released
Genomedata — source
Efficiently stores multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. Utilities have also been developed to load data into this format. A reference implementation in Python and C components is available under the GNU General Public License.
Software
released
BEDOPS — source
Performs common genomic analysis tasks and offers improved flexibility, scalability and execution time characteristics over previously published packages. The suite includes a utility to compress large inputs into a lossless format that can provide greater space savings and faster data extractions than alternatives.
Software type: file format conversion
Software
released
SPOT (Signal Portion of Tags) — source
Measures signal-to-noise in genome-wide epigenetic profiling assays by calculating the fraction of reads that fall in tag-enriched regions (see the Hotspot program) from a sample of 5 million reads. The SPOT methodology can be generalized to use any peak-finder. A publication of SPOT and a more complete description are in preparation. SPOT is simply the percentage of all tags that fall in hotspots, and the publication for the Hotspot quality metric is found at PMID: 21258342.
Software type: quality metric
Software
released
Phantompeakqualtools — source
Used to generate these quality metrics: NSC and RSC. The NSC (Normalized strand cross-correlation) and RSC (relative strand cross-correlation) metrics use cross-correlation of stranded read density profiles to measure enrichment independently of peak calling.
Software type: quality metric, filtering
Software
released
TIP — source
Predicts the targets of transcription factors using the data from ChIP-seq experiments.
Software
released
CAGT (Clustering AGgregation Tool) — source
Deciphers the heterogeneity and diversity of profiles of functional signals (e.g., chromatin mark ChIP-seq signal) centered at a collection of sites (e.g., TSSs or TF binding sites) in a genome. Rather than averaging the profiles over all the anchor sites (traditional aggregation plots), CAGT accounts for the inherent heterogeneity in signal magnitude, shape and implicit strand orientation of chromatin marks. CAGT partitions the set of anchor sites into compact clusters such that each cluster represents anchor points that show similar patterns of the functional signal profiles with different clusters having distinct patterns. The different groups of patterns are often enriched for distinct biological functions (PMID: 22955985).
Software type: genome segmentation
Software
released
ACT — source
Performs aggregation and correlation analyses of genomic tracks.
Software
released
Segway — source
Uses a machine learning method to analyze multiple tracks of functional genomics data, searching for recurring patterns. The software automatically partitions the genome into non-overlapping segments and assigns each segment a label. The resulting annotation provides a human-interpretable summary of the functional landscape of the genome, yielding hypotheses about novel instances or classes of functional elements.
Software type: genome segmentation
Software
released
Segtools — source
A Python package that analyzes genomic segmentations. The software efficiently calculates a variety of summary statistics and produces corresponding publication quality visualizations. The overall goal of Segtools is to provide a bird's-eye view of complex genomic data sets, allowing researchers to easily generate and confirm hypotheses.
Software type: genome segmentation
Software
released
ChromHMM — source
Learns and characterizes chromatin states.
Software type: genome segmentation
Software
released
Wiggler — source
Produces normalized genome-wide signal coverage tracks from raw read alignment files. Allows pooling of replicate datasets while allowing for replicate and data-type specific read shifting and smoothing parameters. It can be used to generate signal density maps for ChIP-seq, DNase-seq, FAIRE-seq and MNase-seq data. Wiggler also implicitly models variability in mappability to appropriately normalize signal density and distinguish missing data from true zero signal.
Software
released
Irreproducible Discovery Rate (IDR) — source
Measures consistency between replicates in high-throughput experiments. Also uses reproducibility in score rankings between peaks in each replicate to determine an optimal cutoff for significance. The core IDR R package can be downloaded from the IDR download page: http://cran.r-project.org/web/packages/idr/index.html
Software type: quality metric, filtering
Software
released
incRNA — source
A computational framework that identifies structured RNAs by combining a variety of gene expression data and several sequence-based metrics.
Software type: transcript identification
Software
released
Hotspot — source
Identifies regions of local enrichment, including peaks, in genomic short-read sequence data. Uses the binomial distribution with a local background model to automatically correct for broad-scale regional differences in tag levels. It is applicable to a wide variety of epigenetic profiling assays, including ChIP-seq and DNase-seq. Hotspot forms the basis of the SPOT data quality metric.
Software type: peak caller
Software
released
AlleleSeq — source
Quantifies the allele-specific binding and expression at the SNP sites using the RNA-seq, ChIP-seq datasets, and diploid genome sequences.
Software
released
Scripture — source
Reconstructs transcriptomes, relying solely on RNA-seq reads and an assembled genome to build a transcriptome ab initio. The statistical methods to estimate read coverage significance are also applicable to other sequencing data. Scripture also has modules for ChIP-seq peak calling.
Software type: transcriptome assembly
Software
released
RSEQtools — source
Processes, quantifies, and annotates gene expression data from RNA-seq experiments. Utilizes Mapped Read Format (MRF) for secure and efficient analysis
Software
released
IQSeq — source
Uses RNA-seq data for isoform quantification.
Software
released
FusionSeq — source
A computational framework that detects chimeric transcripts from paired-end RNA-seq experiments.
Software type: transcript identification
Software
released
Flux Capacitor — source
A program to estimate the frequencies of annotated transcripts (GTF format) from an RNA-Seq experiment, solving a linear program inferred from the observed read mappings (BAM format). There are options for single, stranded, and/or paired-end reads.
Software
released
STAR — source
STAR (Spliced Transcript Alignment to a Reference) aligns short and bulk RNA-seq reads to a reference genome using uncompressed suffix arrays.
Software type: aligner
Software
released
MACS — source
A widely-used, fast, robust ChIP-seq peak-finding algorithm that accounts for the offset in forward-strand and reverse-strand reads to improve resolution and uses a dynamic Poisson distribution to effectively capture local biases in the genome. MACS 1.4 was used in the ENCODE 2 uniform peak calling pipeline.
Software type: peak caller
Software
released
PeakSeq — source
Identifies enriched regions in ChIP-seq type experiments and explicitly compares signal experiments to control experiments. PeakSeq will be used in the ENCODE 3 uniform peak calling pipeline.
Software type: peak caller
Software
released
GEM — source
GEM is a Java software package for analyzing genome wide ChIP-seq/ChIP-exo data. GEM can decompose single observed peaks into multiple binding events, determine binding event location at high spatial resolution, and discover explanatory DNA sequence motifs with an integrated model of ChIP reads and proximal DNA sequences. GEM is able to process single-end or paired-end data and can be run in single-condition mode or multi-condition mode. GEM will be used in the ENCODE 3 uniform peak calling pipeline.
Software type: peak caller
Software
released
SPP — source
A ChIP-seq peak calling algorithm, implemented as an R package, that accounts for the offset in forward-strand and reverse-strand reads to improve resolution, compares enrichment in signal to background or control experiments, and can also estimate whether the available number of reads is sufficient to achieve saturation, meaning that additional reads would not allow identification of additional peaks. SPP will be used in the ENCODE 3 uniform peak calling pipeline.
Software type: peak caller
Software
released

ENCODE Software

Software type

Lab

Showing 146 of 146 results