ENCODE Software
All software used or developed by the ENCODE Consortium
Showing 50 of 199 results
Number of displayed results:
- Distal Regulation E-G correlation — sourceCompute correlation metrics between DNase-seq signal at cCREs with DNase-seq signal at gene promoters or RNA expression levels of genes.
- Distal regulation ENCODE-rE2G — sourceTrain ENCODE-rE2G models on CRISPR enhancer screen data and apply to generate genome-wide predictions of enhancer-gene regulatory connections.
- mex_gene_archive — sourcemex_gene_archive is a minimal file format designed to meet the needs of archiving sparse gene matrices in a format compatible with the ENCODE 4 Data Coordination Center.Software type: other
- OpenMiChrom — sourceUsed to create an ensemble of 3D structures with chromatin dynamics simulation software with input data from the Sequence Annotations (bed file) from PyMEGABASE.
- PyMEGABASE — sourcePyMEGABASE is used to generate sequence annotations at the compartment and subcompartment level for physical modeling annotations.
- PROcapNet Model Zoo Pipeline — sourceSoftware for BPNet models using PRO-cap data.Software type: machine learning
- TF ChIP-seq BPNet Model Zoo Pipeline — sourcePlaceholder description.Software type: machine learning
- SwanSwan is a Python library designed for the analysis and visualization of transcriptomes.Software type: other
- ABC-Enhancer-Gene-Prediction — sourceCell type specific enhancer-gene predictions using ABC model (Fulco, Nasser et al, Nature Genetics 2019)
- EPIraction — sourceThe EPIraction algorithm uses Tikhonov-regularized least squares models to predict the interacting promoter-enhancer pairs.
- AnalyzeSpearATACSoftware used to analyze Greenleaf lab's SpearATAC (perturbation followed by snATAC-seq) data.
- CerberusCerberus software for long-read RNA-seq analysisSoftware type: other
- CRISPRi-FlowFISHSoftware for the analysis of CRISPRi-FlowFISH data from Engreitz lab.
- GraphReg — sourceGraphReg (Chromatin interaction aware gene regulatory modeling with graph attention networks) is a graph neural network based gene regulation model which integrates DNA sequence, 1D epigenomic data (such as chromatin accessibility and histone modifications), and 3D chromatin conformation data (such as Hi-C, HiChIP, Micro-C, HiCAR) to predict gene expression in an informative way.
- HiCDCPlus — sourceThe package HiCDCPlus provides methods to determine significant and differential chromatin interactions by use of a negative binomial generalized linear model, as well as implementations for TopDom to call topologically associating domains (TADs), and Juicer eigenvector to find the A/B compartments. This vignette explains the use of the package and demonstrates typical workflows on HiC and HiChIP data.
- TRACETranscription Factor Footprinting Using DNase I Hypersensitivity Data and DNA Sequence
- 3d-dna — sourceWe begin with a series of iterative steps whose goal is to eliminate misjoins in the input scaffolds. Each step begins with a scaffold pool (initially, this pool is the set of input scaffolds themselves). The scaffolding algorithm is used to order and orient these scaffolds. Next, the misjoin correction algorithm is applied to detect errors in the scaffold pool, thus creating an edited scaffold pool. Finally, the edited scaffold pool is used as an input for the next iteration of the misjoin correction algorithm. The ultimate effect of these iterations is to reliably detect misjoins in the input scaffolds without removing correctly assembled sequence. After this process is complete, the scaffolding algorithm is applied to the revised input scaffolds, and the output – a single “megascaffold” which concatenates all the chromosomes – is retained for post-processing.
- sQTLseekeR — sourcesQTLseekeR is a package to detect splicing QTLs (sQTLs), which are variants associated with change in the splicing pattern of a gene. In sQTLSeeker, splicing patterns are modeled by the relative expression of the transcripts of a gene. The most recent version of sQTLseekeR can be employed to detect genetic variant associated to any multivariate phenotypeSoftware type: variant annotation
- ggsashimi — sourcea command-line tool for the visualization of splicing events across multiple samples. Given a specified genomic region, ggsashimi creates sashimi plots for individual RNA-seq experiments as well as aggregated plots for groups of experiments. It uses popular bioinformatics file formats, it is annotation-independent, and allows the visualization of splicing events even for large genomic regions by scaling down the genomic segments between splice sites. It is implemented in python, and internally generates R code for plotting.Software type: visualization
- Library sequencing match — sourceHouse script that was matching the guides (from an input list) to the fastq files as returned by deep sequencingSoftware type: quantification
- eFORGE — sourceeFORGE identifies tissue or cell type-specific signal by analysing a minimum set of 5 differentially methylated positions (DMPs) for overlap with DNase I hypersensitive sites (DHSs) compared to matched background DMPs and provides both graphical and tabulated outputs.Software type: integrated analysis
- GenomeStudio — sourceSoftware developed by Illumina for analysis of microarray data.Software type: other
- CRISPR screen peak calling — sourceTakes CASA output and makes ENCODE sandard element quantification fileSoftware type: file format conversion
- CRISPR screen track builder — sourceTakes guide quantification and builds a browser track perturbation signal fileSoftware type: quantification
- ptools_bin — sourceA data-sanitization software allowing raw functional genomics reads to be shared while minimizing privacy leakage, enabling principled privacy-utility trade-offs.Software type: other
- SCREEN — sourceSCREEN is a web-based visualizer for the ENCODE Registry of cCREs. Users can search for cCREs by genomic region or by associated features such as genes and SNPs, and can also visualize associated underlying annotations from the ground and integrative levels of the ENCODE Encyclopedia such as gene expression, TF ChIP-seq peaks, chromatin states, and cCRE-target gene links. Additionally, users can access ENCODE data on the functional characterization of cCREs.
- apricot — sourceapricot implements submodular optimization for the purpose of summarizing massive data sets into minimally redundant subsets that are still representative of the original data. These subsets are useful for both visualizing the modalities in the data and for training accurate machine learning models with just a fraction of the examples and compute.
- CRADLE — sourceCRADLE (Correcting Read counts and Analysis of DifferentiaLly Expressed regions) is a package that was developed to analyze STARR-seq data. CRADLE removes technical biases from sonication, PCR, mappability and G-quadruplex sturcture, and generates bigwig files with corrected read counts. CRADLE then uses those corrected read counts and detects both activated and repressed enhancers. CRADLE will help find enhancers with better accuracy and credibility.