ENCODE Software
All software used or developed by the ENCODE Consortium
Showing 100 of 300 results
Number of displayed results:
- Distal Regulation E-G correlation — sourceCompute correlation metrics between DNase-seq signal at cCREs with DNase-seq signal at gene promoters or RNA expression levels of genes.
- Distal regulation ENCODE-rE2G — sourceTrain ENCODE-rE2G models on CRISPR enhancer screen data and apply to generate genome-wide predictions of enhancer-gene regulatory connections.
- PROcapNet Model Zoo Pipeline — sourceSoftware for BPNet models using PRO-cap data.Software type: machine learning
- TF ChIP-seq BPNet Model Zoo Pipeline — sourcePlaceholder description.Software type: machine learning
- pyrangesGenomicRanges for Python.
- pandasPandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- ABC-Enhancer-Gene-Prediction — sourceCell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
- EPIraction — sourceThe EPIraction algorithm uses Tikhonov-regularized least squares models to predict the interacting promoter-enhancer pairs.
- AnalyzeSpearATACSoftware used to analyze Greenleaf lab's SpearATAC (perturbation followed by snATAC-seq) data.
- CRISPRi-FlowFISHSoftware for the analysis of CRISPRi-FlowFISH data from Engreitz lab.
- GraphReg — sourceGraphReg (Chromatin interaction aware gene regulatory modeling with graph attention networks) is a graph neural network based gene regulation model which integrates DNA sequence, 1D epigenomic data (such as chromatin accessibility and histone modifications), and 3D chromatin conformation data (such as Hi-C, HiChIP, Micro-C, HiCAR) to predict gene expression in an informative way.
- HiCDCPlus — sourceThe package HiCDCPlus provides methods to determine significant and differential chromatin interactions by use of a negative binomial generalized linear model, as well as implementations for TopDom to call topologically associating domains (TADs), and Juicer eigenvector to find the A/B compartments. This vignette explains the use of the package and demonstrates typical workflows on HiC and HiChIP data.
- ZeroneZerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
- GEM-ToolsGEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
- Fastx Toolkit — sourceThe FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
- TRACETranscription Factor Footprinting Using DNase I Hypersensitivity Data and DNA Sequence
- bioraddbg ATAC-seq MACS2 — sourceThis Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
- bioraddbg ATAC-seq filter beads — sourceThis Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
- bioraddbg ATAC-seq BWA — sourceThis Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
- bioraddbg ATAC-seq deconvolute — sourceThis Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
- guppy_basecaller — sourceOnt-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
- polyAsite_workflow — sourcePipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
- gencode_utr_fix — sourceThis package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
- interpretation_samples — sourceInterpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.Software type: genome segmentation
- split-pipe — sourceThe Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
- PRINSEQ Lite — sourcePRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
- sQTLseekeR — sourcesQTLseekeR is a package to detect splicing QTLs (sQTLs), which are variants associated with change in the splicing pattern of a gene. In sQTLSeeker, splicing patterns are modeled by the relative expression of the transcripts of a gene. The most recent version of sQTLseekeR can be employed to detect genetic variant associated to any multivariate phenotypeSoftware type: variant annotation
- ggsashimi — sourcea command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments. It uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. It is implemented in python, and internally generates R code for plotting.Software type: visualization
- Library sequencing match — sourceHouse script that was matching the guides (from an input list) to the fastq files as returned by deep sequencingSoftware type: quantification
- GenomeStudio — sourceSoftware developed by Illumina for analysis of microarray data.Software type: other
- CRISPR screen peak calling — sourceTakes CASA output and makes ENCODE sandard element quantification fileSoftware type: file format conversion
- CRISPR screen track builder — sourceTakes guide quantification and builds a browser track perturbation signal fileSoftware type: quantification
- ptools_bin — sourceA data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.Software type: other
- SCREEN — sourceSCREEN is a web-based visualizer for the ENCODE Registry of cCREs. Users can search for cCREs by genomic region or by associated features such as genes and SNPs, and can also visualize associated underlying annotations from the ground and integrative levels of the ENCODE Encyclopedia such as gene expression, TF ChIP-seq peaks, chromatin states, and cCRE-target gene links. Additionally, users can access ENCODE data on the functional characterization of cCREs.
- apricot — sourceapricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data and for training accurate machine learning models with just a fraction of the examples and compute.
- CRADLE — sourceCRADLE (Correcting Read counts and Analysis of DifferentiaLly Expressed regions) is a package that was developed to analyze STARR-seq data. CRADLE removes technical biases from sonication, PCR, mappability and G-quadruplex sturcture, and generates bigwig files with corrected read counts. CRADLE then uses those corrected read counts and detects both activated and repressed enhancers. CRADLE will help find enhancers with better accuracy and credibility.
- Croo — sourceCroo is a Python package for organizing outputs from Cromwell. Croo parses metadata.json which is an output from Cromwell and makes an organized directory with a copy (or a soft link) of each output file as described in an output definition JSON file specified by --out-def-json.Software type: framework
- Caper — sourceCaper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for Cromwell. Caper is based on Unix and cloud platform CLIs (curl, gsutil and aws) and provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Also, Caper supports easy automatic file transfer between local/cloud storages (local path, s3://, gs:// and http(s)://). You can use these URIs in input JSON file or for a WDL file itself.Software type: framework
- Check Files — sourceFiles are checked to see if the MD5 sum (both for gzipped and ungzipped) is identical to the submitted metadata, as well as run through the validateFiles program from Jim Kent's source utilities.
- liftOverThis UCSC tool converts genome coordinates and genome annotation files between assemblies.
- fastq-tools — sourceA collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.Software type: other
- mpraflow-tsv-to-bedThis is a one-line custom Perl script used to generate a bed format file from tsv.
- FASTQ read-name correctionA script resolving FASTQ read-name inconsistencies
- scPOST — sourceSimulation of single-cell datasets for power analyses that estimate power to detect cell state frequency shifts between conditions (e.g. an expansion of a cell state in disease vs. healthy), as described in our manuscript “Maximizing statistical power to detect clinically associated cell states with scPOST”.Software type: other
- cdr3-QTL — sourceWe tested associations between HLA genotypes and TCR-CDR3 amino acid compositions. We treated the amino acid composition of CDR3 as a quantitative trait, and tested its association with HLA genotype; we call this CDR3 quantitative trait loci analysis (cdr3-QTL), as described in our manuscript “HLA autoimmune risk alleles restrict the hypervariable region of T cell receptors”.Software type: other
- Imperio — sourceThis software includes (i) DeepBoost, a gradient boosting method for constructing boosted deep learning annotations by integrating deep learning allelic-effect annotations with fine-mapped SNPs; (ii) tools to combine these deep learning annotations with SNP-to-gene (S2G) linking strategies and relevant gene sets, and (iii) Imperio, a method for integrating deep learning annotations with S2G strategies to predict gene expression in whole blood and construct allelic-effect annotations based on changes in predicted expression. Applications of these 3 approaches to blood-related traits are described in our manuscript “Integrative approaches to improve the informativeness of deep learning models for human complex diseases”.Software type: other
- GSSG — sourceGSSG consists of tools to generate enhancer-driven and master-regulator gene scores in blood, and combine these gene scores with distal and proximal SNP-to-gene (S2G) linking strategies to construct SNP annotations for blood-related traits, as described in our manuscript “Unique contribution of enhancer-driven and master-regulator genes to autoimmune disease revealed using functionally informed SNIP-to-gene linking strategies”.Software type: other
- AnnotBoost — sourceAnnotBoost is a gradient boosting-based framework to impute and denoise Mendelian disease-derived pathogenicity scores to improve their informativeness for common disease, as described in our manuscript “Improving the informativeness of Mendelian disease-derived pathogenicity scores for common disease”.Software type: variant annotation
- ChromImpute — sourceChromImpute is software for large-scale systematic epigenome imputation. ChromImpute takes an existing compendium of epigenomic data and uses it to predict signal tracks for mark-sample combinations not experimentally mapped or to generate a potentially more robust version of data sets that have been mapped experimentally. ChromImpute bases its predictions on features from signal tracks of other marks that have been mapped in the target sample and the target mark in other samples with these features combined using an ensemble of regression trees.
- Cell Ranger — sourceCell Ranger is a set of analysis pipelines that process Chromium single-cell RNA-seq output to align reads, generate feature-barcode matrices and perform clustering and gene expression analysis (mkfastq, count, aggr, and reanalyze).
- gemBS — sourcegemBS is a high performance bioinformatic pipeline designed for highthroughput analysis of DNA methylation data from whole genome bisulfites sequencing data (WGBS). It combines GEM3, a high performance read aligner and bs_call, a high performance variant and methyation caller, into a streamlined and efficient pipeline for bisulfite sueqnce analysis.
- mountainClimber — sourcemountainClimber is a method for de novo identification of alternative transcript start sites and polyadenylation sites in RNA-seq dataSoftware type: transcript identification
- WashU Epigenome Browser — sourceThe WashU Epigenome Browser provides visualization, integration and analysis tools for epigenomic datasets. Since 2010, it has provided the scientific community with data from large consortia including the Roadmap Epigenomics and the ENCODE projects. Browser features include: (i) visualization using virtual reality (VR), which has implications in biology education and the study of 3D chromatin structure; (ii) expanded public data hubs, including data from the 4DN, ENCODE, Roadmap Epigenomics, TaRGET, IHEC and TCGA consortia; (iii) a more responsive user interface; (iv) a history of interactions, which enables undo and redo; (v) a feature we call Live Browsing, which allows multiple users to collaborate remotely on the same session; (vi) the ability to visualize local tracks and data hubs. Amazon Web Services also hosts the browser at https://epigenomegateway.org/.Software type: database, other