ENCODE Software
All software used or developed by the ENCODE Consortium
Showing 100 of 146 results
Number of displayed results:
- pyrangesGenomicRanges for Python.
- pandasPandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- ZeroneZerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
- GEM-ToolsGEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
- Fastx Toolkit — sourceThe FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
- bioraddbg ATAC-seq MACS2 — sourceThis Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
- bioraddbg ATAC-seq filter beads — sourceThis Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
- bioraddbg ATAC-seq BWA — sourceThis Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
- bioraddbg ATAC-seq deconvolute — sourceThis Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
- guppy_basecaller — sourceOnt-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
- polyAsite_workflow — sourcePipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
- gencode_utr_fix — sourceThis package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
- interpretation_samples — sourceInterpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.Software type: genome segmentation
- split-pipe — sourceThe Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
- PRINSEQ Lite — sourcePRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
- liftOverThis UCSC tool converts genome coordinates and genome annotation files between assemblies.
- fastq-tools — sourceA collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.Software type: other
- Cell Ranger — sourceCell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis (mkfastq, count, aggr, and reanalyze).
- pbsv — sourcepbsv is a suite of tools to call and analyze structural variants in diploid genomes from PacBio single molecule real-time sequencing (SMRT) reads. The tools power the Structural Variant Calling analysis workflow in PacBio's SMRT Link GUI. pbsv calls insertions, deletions, inversions, duplications, and translocations. Both single-sample calling and joint (multi-sample) calling are provided.
- freebayes — sourcefreebayes is a Bayesian genetic variant detector designed to find small polymorphisms, specifically SNPs (single-nucleotide polymorphisms), indels (insertions and deletions), MNPs (multi-nucleotide polymorphisms), and complex events (composite insertion and substitution events) smaller than the length of a short-read sequencing alignment.
- PysamPython module warapping htslib C-API and samtools for accessing sam formatted alignment filesSoftware type: other
- MATS — sourceMATS is a computational tool to detect differential alternative splicing events from RNA-Seq data. The statistical model of MATS calculates the P-value and false discovery rate that the difference in the isoform ratio of a gene between two conditions exceeds a given user-defined threshold. From the RNA-Seq data, MATS can automatically detect and analyze alternative splicing events corresponding to all major types of alternative splicing patterns. MATS handles replicate RNA-Seq data from both paired and unpaired study design.
- Bowtie 2Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.
- bigWigToWig — sourceThe binary bigWig format can be converted to the text based wig or bedGraph formats using this utility.Software type: file format conversion
- PyLiftover — sourcePyLiftover is a library for quick and easy conversion of genomic (point) coordinates between different assemblies. It uses the same logic and coordinate conversion mappings as the UCSC liftOver tool.Software type: other
- trim-adapters-illumina — sourceThis program will trim adapters from pair-end sequencing tags produced using the Illumina(c) platform.Software type: filtering
- edwBamFilter — sourceRemove reads from a BAM file based on a number of criteriaSoftware type: filtering
- edwBamStats — sourceCollect some basic characterization statistics of a BAM file.Software type: quality metric
- GATK — sourceThe Genome Analysis Toolkit or GATK is a software package for analysis of high-throughput sequencing data, developed by the Data Science and Data Engineering group at the Broad Institute. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance. Its robust architecture, powerful processing engine and high-performance computing features make it capable of taking on projects of any size.Software type: variant annotation
- SeparateReadpairs — sourceSplits up the interleaved file into two valid paired fastqs.Software type: other
- NextGenMap — sourceNextGenMap is a flexible and fast read mapping program that is more than twice as fast as BWA while achieving a mapping sensitivity similar to Stampy.Software type: aligner
- gnuplot — sourceGnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gnuplot has been supported and under active development since 1986.Software type: other
- Preseq — sourceFrom the Smith lab: "The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment."Software type: other
- CpG methylation correlation — sourceCalculates spearman correlation of 2 replicate bedmethyl files of CpG methylation.Software type: quality metric
- Trim Galore — sourceTrim Galore! is a wrapper script to automate quality and adapter trimming as well as quality control, with some added functionality to remove biased methylation positions for RRBS sequence files (for directional, non-directional (or paired-end) sequencing).Software type: filtering
- permseq — sourceAn R package that performs multi-read mapping of ChIP-seq datasets. permseq works with bowtie and takes as input fastq files . It can work with just ChIP-seq data or, when other complementary data such as DNase-seq, histone ChIP-seq are available, it can utilize these data sources as prior information for multi-read mapping. The output from permseq is a text file of aligned reads available in bed, tagAlign, or bam formats.
- mosaics — sourceAn R package for TF and histone ChIP-seq analysis. mosaics takes as input the aligned files. It provides diagnostics plots for evaluating how well the mosaics model fits and allows FDR control. The mosaics-hmm module provides boundary adjusted broad peak calls. The output from mosaics is a set of peaks in a number of formats including bed. mosaics also generates intermediate data files/objects such as genome-wide read counts at the bin level for specified bin sizes, wig files for visualizing on the browser.
- atSNP — sourceAn R package for screening SNPs for their potential to enhance or disrupt transcription factor binding sites. atSNP accepts as input either SNP ids or the actual coordinates of the SNPs and the alternative alleles. It uses ENCODE motifs and JASPAR motifs to evaluate the regulatory potential of the SNPs; however, it also allows user specified set of transcription factor binding sites in the form of position specific matrices. It outputs for each SNP the significance of the match to each position specific matrix with both the reference and the alternative allele and also the significance of the change in these match scores. atSNP also provides easy visualization of the SNP impact on the binding site by composite logo plots.
- wigToBigWig — sourceThe bigWig format is for display of dense, continuous data that will be displayed in the Genome Browser as a graph. BigWig files are created initially from wiggle (wig) type files, using the program wigToBigWig. The resulting bigWig files are in an indexed binary format. The main advantage of the bigWig files is that only the portions of the files needed to display a particular region are transferred to UCSC, so for large data sets bigWig is considerably faster than regular wiggle files.Software type: file format conversion
- Median Absolute Deviation — sourceCalculates the Median Absolute Deviation (MAD) and correlation of two gene quantifications from replicate RNA-seq experiments. A measure of reproducibility, inversely correlated with data quality.Software type: quality metric
- Tophat BAM Repair — sourcetophat_bam_xsA_tag_fix.pl was written by x wei to allow the use of tophat 2.0.8 in the ENCODE pipelines. It reads a bam file generated from paired-ended fastqs by tophat 2.0.8 and corrects the XS:A:+ or XS:A:- tags showing read strandedness.Software type: other
- Concat-fastq — sourceConcat-fastqs is an applet available for DNA-nexus to concatenate a set of fastqs that should be merged for analysis.Software type: other
- bedToBigBed — sourcebedToBigBed takes a standard bed file or a non-standard bed file with associated .as file to create a compressed bigBed version. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891.Software type: file format conversion
- MakepseudoreplicatesGenerate psuedoreplicates for self-consistency tests.Software type: other
- WGBS output processor — sourceConvert a Bismark CX_report file to bed-like filesSoftware type: other
- Samtools — sourceSamtools is a suite of programs for interacting with high-throughput sequencing data. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments (PMID:19505943).Software type: other
- Picard — sourceA set of tools (in Java) for working with next generation high-throughput sequencing (HTS) data in the BAM format. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data. Currenty no published paper for Picard software.Software type: filtering, other
- Bismark — sourceA tool to map bisulfite converted sequence reads and determine cytosine methylation states. The output produced by Bismark discriminates between cytosines in CpG, CHG and CHH context and enables bench scientists to visualize and interpret their methylation data soon after the sequencing run is completed (PMID: 21493656).Software type: other
- Flux Capacitor — sourceThe exonic structure of two spliceforms. FluxCapacitor recontructs abundances of known transcript forms from RNAseq data (PMCID: PMC3836232).Software type: transcript identification
- BWA — sourceBWA is a software package for mapping low-divergent sequences based on a Burrows-Wheeler index against a large reference genome, such as the human genome. Publications for the short read alignment component is found at PMID: 19451168, while PMID: 20080505 outlines the algorithm to align sequences >200bp up to 1Mb.Software type: aligner
- npIDR — sourceNon-parametric Irreproducibe Detection Rate (npIDR) essentially takes a pooled sample of all replicas and computes (a) the frequency of seeing count=x; (b) the frequency of seeing count=x given that in *ALL* other replicas the count is equal to zero. Original Irreproducible detection rate statistical test published (DOI: 10.1214/11-AOAS466).Software type: quality metric
- FastQC — sourceFastQC aims to provide a simple way to do some quality control checks on raw sequence data coming from high throughput sequencing pipelines. It provides a modular set of analyses which you can use to give a quick impression of whether your data has any problems of which you should be aware before doing any further analysis. Babraham Bioinformatics Web site, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Software type: quality metric
- bedGraphToBigWig — sourceConvert bedGraph to bigWig file. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891Software type: file format conversion
- WASP — sourceWASP is a software package for two related tasks: (1) correcting allelic bias in mapped sequencing reads and, (2) identifying molecular quantitative trait loci (QTLs) using next-generation sequencing data (e.g. gene expression QTLs or histone mark QTLs). The WASP mapper works with any read mapping pipeline that outputs BAM or SAM format. WASP identifies molecular QTLs using a statistical test that combines information about the total depth and allelic imbalance of mapped reads. WASP can call QTLs with very small sample sizes (as few as 10) compared to traditional QTL mapping approaches.Software type: aligner, variant annotation
- Webgestalt — sourceWebGestalt is a "WEB-based GEne SeT AnaLysis Toolkit". It is designed for functional genomic, proteomic and large-scale genetic studies from which a large number of gene lists (e.g., differentially expressed gene sets, co-expressed gene sets, etc.) are continuously generated. WebGestalt incorporates information from different public resources and provides an easy way for biologists to make sense out of gene lists.
- RuleFit3 — sourceRuleFit3 is a predictive learning method and interpretational tool. It is based on general regression and classification models, which are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables (doi:10.1214/07-AOAS148).
- Mfinder — sourcemfinder is a software tool for network motifs detection. Network motifs are defined as basic interaction patterns that recur throughout biological networks, much more often than in random networks. In order to detect network motifs mfinder implements two methods: a full enumeration of subgraphs and a sampling of subgraphs for estimation of subgraph concentrations. mfinder generates random networks based on the switching method, the stubs method and "Go with the winners" algorithm.
- lumi package — sourceThe lumi package in R provides an integrated solution for the Illumina microarray data analysis. It includes functions of Illumina BeadStudio (GenomeStudio) data input, quality control, BeadArray-specific variance stabilization, normalization and gene annotation at the probe level. It also includes the functions of processing Illumina methylation microarrays, especially Illumina Infinium methylation microarrays.
- King — sourceKING is a rapid algorithm for relationship inference using high-throughput genotype data typical of GWAS that allows the presence of an unknown population substructure. The relationship of any pair of individuals can be precisely inferred by robust estimation of their kinship coefficient, independent of sample composition or population structure (sample invariance). KING performs properly even under extreme population stratification, while algorithms assuming a homogeneous population give systematically biased results. KING performs relationship inference on millions of pairs of individuals in the order of minutes.
- Java Treeview — sourceJava Treeview is an open source, cross-platform gene expression visualization tool and an interactive display of clustered gene expression data, similar to Eisen's treeview. It is also an extensible starting point for other gene expression visualization tools.
- HiveR — sourceThe hive plot is a visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes. Edges are drawn as curved links. Hive plots can give quantitatively understanding for important aspects of a network's structure. Hive plots can also manage the visual complexity arising from a large number of edges and expose both trends and outlier patterns in a network structure.