ENCODE Software
All software used or developed by the ENCODE Consortium
Showing 52 of 52 results
Number of displayed results:
- mex_gene_archive — sourcemex_gene_archive is a minimal file format designed to meet the needs of archiving sparse gene matrices in a format compatible with the ENCODE 4 Data Coordination Center.Software type: other
- SwanSwan is a Python library designed for the analysis and visualization of transcriptomes.Software type: other
- CerberusCerberus software for long-read RNA-seq analysisSoftware type: other
- GenomeStudio — sourceSoftware developed by Illumina for analysis of microarray data.Software type: other
- ptools_bin — sourceA data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.Software type: other
- eCLIP core pipeline — sourceCustom software developed by Yeo lab for use in the eCLIP pipeline.Software type: other
- fastq-tools — sourceA collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.Software type: other
- scPOST — sourceSimulation of single-cell datasets for power analyses that estimate power to detect cell state frequency shifts between conditions (e.g. an expansion of a cell state in disease vs. healthy), as described in our manuscript “Maximizing statistical power to detect clinically associated cell states with scPOST”.Software type: other
- cdr3-QTL — sourceWe tested associations between HLA genotypes and TCR-CDR3 amino acid compositions. We treated the amino acid composition of CDR3 as a quantitative trait, and tested its association with HLA genotype; we call this CDR3 quantitative trait loci analysis (cdr3-QTL), as described in our manuscript “HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors”.Software type: other
- Imperio — sourceThis software includes (i) DeepBoost, a gradient boosting method for constructing boosted deep learning annotations by integrating deep learning allelic-effect annotations with fine-mapped SNPs; (ii) tools to combine these deep learning annotations with SNP-to-gene (S2G) linking strategies and relevant gene sets, and (iii) Imperio, a method for integrating deep learning annotations with S2G strategies to predict gene expression in whole blood and construct allelic-effect annotations based on changes in predicted expression. Applications of these 3 approaches to blood-related traits are described in our manuscript “Integrative approaches to improve the informativeness of deep learning models for human complex diseases”.Software type: other
- GSSG — sourceGSSG consists of tools to generate enhancer-driven and master-regulator gene scores in blood, and combine these gene scores with distal and proximal SNP-to-gene (S2G) linking strategies to construct SNP annotations for blood-related traits, as described in our manuscript “Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNIP-to-gene linking strategies”.Software type: other
- WashU Epigenome Browser — sourceThe WashU Epigenome Browser provides visualization, integration and analysis tools for epigenomic datasets. Since 2010, it has provided the scientific community with data from large consortia including the Roadmap Epigenomics and the ENCODE projects. Browser features include: (i) visualization using virtual reality (VR), which has implications in biology education and the study of 3D chromatin structure; (ii) expanded public data hubs, including data from the 4DN, ENCODE, Roadmap Epigenomics, TaRGET, IHEC and TCGA consortia; (iii) a more responsive user interface; (iv) a history of interactions, which enables undo and redo; (v) a feature we call Live Browsing, which allows multiple users to collaborate remotely on the same session; (vi) the ability to visualize local tracks and data hubs. Amazon Web Services also hosts the browser at https://epigenomegateway.org/.Software type: database, other
- Ascertained Sequentially Markovian Coalescent (ASMC) — sourceASMC is a method for inferring pairwise coalescence times implicating regions under negative selection that are enriched for disease heritability (Palamara et al. 2018 Nat Genet).Software type: other
- Signed LD profile (SLDP) regression — sourceSigned LD profile regression is a method for identifying genome-wide directional effects of signed functional annotations on diseases and complex traits, as described in our manuscript “Detecting genome-wide directional effects of transcription factor binding on polygenic disease risk”.Software type: other
- Generate SIRV GTFs — sourceSet of scripts used to generate GTFs that include SIRV sequences for use with the ENCODE long read RNA-seq pipeline.Software type: other
- RBNS Pipeline — sourceFrom Burge lab (Freese, P.): "The RBNS pipeline is a set of bioinformatics tools to analyze data from high-throughput sequencing experiments of protein-bound RNAs. The current version includes read splitting, calculation of kmer frequencies and enrichments, QC metrics, production of motif sequence logos, and RNA secondary structure analysis."Software type: other
- Surrogate Variable Analysis — sourceThe sva package in Bioconductor contains functions for removing batch effects and other unwanted variation in high-throughput experiment.Software type: other
- PysamPython module warapping htslib C-API and samtools for accessing sam formatted alignment filesSoftware type: other
- PyLiftover — sourcePyLiftover is a library for quick and easy conversion of genomic (point) coordinates between different assemblies. It uses the same logic and coordinate conversion mappings as the UCSC liftOver tool.Software type: other
- dnase-index-bwa — sourceIndexes the reference genome by BWA, and processes mappability and exclusion list (previously "blacklist") for DNase ENCODE uniform processing pipeline.Software type: other
- snow — sourceThe snow package provides support for simple parallel computing on a network of workstations using R. A master R process calls makeCluster to start a cluster of worker processes; the master process then uses functions such as clusterCall and clusterApply to execute R code on the worker processes and collect and return the results on the master. This framework supports many forms of "embarrassingly parallel" computations.Software type: other
- SeparateReadpairs — sourceSplits up the interleaved file into two valid paired fastqs.Software type: other
- gnuplot — sourceGnuplot is a portable command-line driven graphing utility for Linux, OS/2, MS Windows, OSX, VMS, and many other platforms. The source code is copyrighted but freely distributed (i.e., you don't have to pay for it). It was originally created to allow scientists and students to visualize mathematical functions and data interactively, but has grown to support many non-interactive uses such as web scripting. It is also used as a plotting engine by third-party applications like Octave. Gnuplot has been supported and under active development since 1986.Software type: other
- Preseq — sourceFrom the Smith lab: "The preseq package is aimed at predicting and estimating the complexity of a genomic sequencing library, equivalent to predicting and estimating the number of redundant reads from a given sequencing depth and how many will be expected from additional sequencing using an initial sequencing experiment."Software type: other
- srna-index-star — sourceIndexing of reference genome by STAR for small-RNA-seq ECODE uniform processing pipeline.Software type: other
- lrna-index-tophat — sourceIndexing of reference genome by TopHat for bulk-RNA-seq ECODE uniform processing pipeline.Software type: other
- DNA-me pipeline — sourceThis is the git repository for all DNA methylation (WGBS) uniform pipeline code dun on dnanexus by the ENCODE DCC.Software type: other
- Tophat BAM Repair — sourcetophat_bam_xsA_tag_fix.pl was written by x wei to allow the use of tophat 2.0.8 in the ENCODE pipelines. It reads a bam file generated from paired-ended fastqs by tophat 2.0.8 and corrects the XS:A:+ or XS:A:- tags showing read strandedness.Software type: other
- Concat-fastq — sourceConcat-fastqs is an applet available for DNA-nexus to concatenate a set of fastqs that should be merged for analysis.Software type: other
- MakepseudoreplicatesGenerate psuedoreplicates for self-consistency tests.Software type: other
- WGBS output processor — sourceConvert a Bismark CX_report file to bed-like filesSoftware type: other
- Samtools — sourceSamtools is a suite of programs for interacting with high-throughput sequencing data. SAMtools implements various utilities for post-processing alignments in the SAM format, such as indexing, variant caller and alignment viewer, and thus provides universal tools for processing read alignments (PMID:19505943).Software type: other
- Picard — sourceA set of tools (in Java) for working with next generation high-throughput sequencing (HTS) data in the BAM format. Picard is implemented using the HTSJDK Java library HTSJDK, supporting accessing of common file formats, such as SAM and VCF, used for high-throughput sequencing data. Currenty no published paper for Picard software.Software type: filtering, other
- Bismark — sourceA tool to map bisulfite converted sequence reads and determine cytosine methylation states. The output produced by Bismark discriminates between cytosines in CpG, CHG and CHH context and enables bench scientists to visualize and interpret their methylation data soon after the sequencing run is completed (PMID: 21493656).Software type: other