ENCODE Software
All software used or developed by the ENCODE Consortium
Showing 100 of 352 results
Number of displayed results:
- snrna_pseudobulk — sourceScripts for generating gene quantifications for pseudobulks.Software type: quantification, filtering
- subset-bam — sourcesubset-bam is a tool to subset a 10x Genomics BAM file based on a tag, most commonly the cell barcode tag.Software type: filtering
- CellBender — sourceCellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.Software type: other
- Distal Regulation E-G correlation — sourceCompute correlation metrics between DNase-seq signal at cCREs with DNase-seq signal at gene promoters or RNA expression levels of genes.
- Distal regulation ENCODE-rE2G — sourceTrain ENCODE-rE2G models on CRISPR enhancer screen data and apply to generate genome-wide predictions of enhancer-gene regulatory connections.
- mex_gene_archive — sourcemex_gene_archive is a minimal file format designed to meet the needs of archiving sparse gene matrices in a format compatible with the ENCODE 4 Data Coordination Center.Software type: other
- OpenMiChrom — sourceUsed to create an ensemble of 3D structures with chromatin dynamics simulation software with input data from the Sequence Annotations (bed file) from PyMEGABASE.
- PyMEGABASE — sourcePyMEGABASE is used to generate sequence annotations at the compartment and subcompartment level for physical modeling annotations.
- PROcapNet Model Zoo Pipeline — sourceSoftware for BPNet models using PRO-cap data.Software type: machine learning
- TF ChIP-seq BPNet Model Zoo Pipeline — sourcePlaceholder description.Software type: machine learning
- ATAC-seq DNase-seq ChromBPNet Model Zoo Pipeline — sourcePlaceholder description.Software type: machine learning
- ChromBPNet — sourceChromBPNet is a fully convolutional neural network that uses dilated convolutions with residual connections to enable large receptive fields with efficient parameterization.Software type: machine learning
- pyrangesGenomicRanges for Python.
- pandasPandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
- SwanSwan is a Python library designed for the analysis and visualization of transcriptomes.Software type: other
- ABC-Enhancer-Gene-Prediction — sourceCell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
- EPIraction — sourceThe EPIraction algorithm uses Tikhonov-regularized least squares models to predict the interacting promoter-enhancer pairs.
- AnalyzeSpearATACSoftware used to analyze Greenleaf lab's SpearATAC (perturbation followed by snATAC-seq) data.
- CerberusCerberus software for long-read RNA-seq analysisSoftware type: other
- CRISPRi-FlowFISHSoftware for the analysis of CRISPRi-FlowFISH data from Engreitz lab.
- GraphReg — sourceGraphReg (Chromatin interaction aware gene regulatory modeling with graph attention networks) is a graph neural network based gene regulation model which integrates DNA sequence, 1D epigenomic data (such as chromatin accessibility and histone modifications), and 3D chromatin conformation data (such as Hi-C, HiChIP, Micro-C, HiCAR) to predict gene expression in an informative way.
- HiCDCPlus — sourceThe package HiCDCPlus provides methods to determine significant and differential chromatin interactions by use of a negative binomial generalized linear model, as well as implementations for TopDom to call topologically associating domains (TADs), and Juicer eigenvector to find the A/B compartments. This vignette explains the use of the package and demonstrates typical workflows on HiC and HiChIP data.
- ZeroneZerone discretizes several ChIP-seq replicates simultaneously and resolves conflicts between them. Publication available at: doi: 10.1093/bioinformatics/btw336
- GEM-ToolsGEM-Tools is a C API and a Python module to support and simplify usage of the GEM Mapper.
- Fastx Toolkit — sourceThe FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
- TRACETranscription Factor Footprinting Using DNase I Hypersensitivity Data and DNA Sequence
- 3d-dna — sourceWe begin with a series of iterative steps whose goal is to eliminate misjoins in the input scaffolds. Each step begins with a scaffold pool (initially, this pool is the set of input scaffolds themselves). The scaffolding algorithm is used to order and orient these scaffolds. Next, the misjoin correction algorithm is applied to detect errors in the scaffold pool, thus creating an edited scaffold pool. Finally, the edited scaffold pool is used as an input for the next iteration of the misjoin correction algorithm. The ultimate effect of these iterations is to reliably detect misjoins in the input scaffolds without removing correctly assembled sequence. After this process is complete, the scaffolding algorithm is applied to the revised input scaffolds, and the output – a single “megascaffold” which concatenates all the chromosomes – is retained for post-processing.
- bioraddbg ATAC-seq MACS2 — sourceThis Docker container provides an easy to use Docker interface to MACS2 for peak calling with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry.
- bioraddbg ATAC-seq filter beads — sourceThis Docker container provides an easy to use Docker interface to a bead filtration tool with settings tailored for Bio-Rad Single Cell ATAC-seq chemistry. This container takes in .BAM files and performs "knee calling" to compute a bead barcode whitelist and jaccard index threshold for bead-to-droplet merging.
- bioraddbg ATAC-seq BWA — sourceThis Docker container provides an easy to use Docker interface to the BWA alignment tool with settings tailored for Bio-Rad ATAC-Seq chemistry.
- bioraddbg ATAC-seq deconvolute — sourceThis Docker container provides an easy to use Docker interface to BAP tool with settings tailored for Bio-Rad ATAC-seq chemistry.
- guppy_basecaller — sourceOnt-Guppy is a basecalling software available to Oxford Nanopore customers. For more information, please see https://nanoporetech.com/
- polyAsite_workflow — sourcePipeline to infer poly(A) site clusters through processing of 3' end sequencing libraries prepared according to various protocols.
- gencode_utr_fix — sourceThis package fixes UTR features in the third columns of Gencode GTF by converting UTR annotation into five_prime_utr and three_prime_utr similar to Ensembl.
- interpretation_samples — sourceInterpretation code for Segway samples that produces classifier output and diagnostic plots from the apply_samples.py, for test samples.Software type: genome segmentation
- split-pipe — sourceThe Parse Biosciences computational pipeline is an out-of-the-box software tool that you can run locally to convert fastq files straight to processed data (including gene-cell count matrices). Customers purchasing the Whole Transcriptome Kit will receive access to the Parse computational pipeline.
- PRINSEQ Lite — sourcePRINSEQ will preprocess genomic or metagenomic sequence data in FASTA or FASTQ format
- sQTLseekeR — sourcesQTLseekeR is a package to detect splicing QTLs (sQTLs), which are variants associated with change in the splicing pattern of a gene. In sQTLSeeker, splicing patterns are modeled by the relative expression of the transcripts of a gene. The most recent version of sQTLseekeR can be employed to detect genetic variant associated to any multivariate phenotypeSoftware type: variant annotation
- ggsashimi — sourcea command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments. It uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. It is implemented in python, and internally generates R code for plotting.Software type: visualization
- Library sequencing match — sourceHouse script that was matching the guides (from an input list) to the fastq files as returned by deep sequencingSoftware type: quantification
- eFORGE — sourceeFORGE identifies tissue or cell type-specific signal by analysing a minimum set of 5 differentially methylated positions (DMPs) for overlap with DNase I hypersensitive sites (DHSs) compared to matched background DMPs and provides both graphical and tabulated outputs.Software type: integrated analysis
- GenomeStudio — sourceSoftware developed by Illumina for analysis of microarray data.Software type: other
- CRISPR screen peak calling — sourceTakes CASA output and makes ENCODE sandard element quantification fileSoftware type: file format conversion
- CRISPR screen track builder — sourceTakes guide quantification and builds a browser track perturbation signal fileSoftware type: quantification
- ptools_bin — sourceA data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.Software type: other
- SCREEN — sourceSCREEN is a web-based visualizer for the ENCODE Registry of cCREs. Users can search for cCREs by genomic region or by associated features such as genes and SNPs, and can also visualize associated underlying annotations from the ground and integrative levels of the ENCODE Encyclopedia such as gene expression, TF ChIP-seq peaks, chromatin states, and cCRE-target gene links. Additionally, users can access ENCODE data on the functional characterization of cCREs.
- apricot — sourceapricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data and for training accurate machine learning models with just a fraction of the examples and compute.
- CRADLE — sourceCRADLE (Correcting Read counts and Analysis of DifferentiaLly Expressed regions) is a package that was developed to analyze STARR-seq data. CRADLE removes technical biases from sonication, PCR, mappability and G-quadruplex sturcture, and generates bigwig files with corrected read counts. CRADLE then uses those corrected read counts and detects both activated and repressed enhancers. CRADLE will help find enhancers with better accuracy and credibility.
- Croo — sourceCroo is a Python package for organizing outputs from Cromwell. Croo parses metadata.json which is an output from Cromwell and makes an organized directory with a copy (or a soft link) of each output file as described in an output definition JSON file specified by --out-def-json.Software type: framework
- Caper — sourceCaper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for Cromwell. Caper is based on Unix and cloud platform CLIs (curl, gsutil and aws) and provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Also, Caper supports easy automatic file transfer between local/cloud storages (local path, s3://, gs:// and http(s)://). You can use these URIs in input JSON file or for a WDL file itself.Software type: framework
- encode_utils — sourceTools that are useful to any ENCODE Consortium submitting group, as well as the general community working with ENCODE data. Library and scripts are coded in Python.
- Check Files — sourceFiles are checked to see if the MD5 sum (both for gzipped and ungzipped) is identical to the submitted metadata, as well as run through the validateFiles program from Jim Kent's source utilities.
- liftOverThis UCSC tool converts genome coordinates and genome annotation files between assemblies.
- eCLIP core pipeline — sourceCustom software developed by Yeo lab for use in the eCLIP pipeline.Software type: other
- fastq-tools — sourceA collection of small and efficient programs for performing some common and uncommon tasks with FASTQ files.Software type: other