Software tools used in integrative analysis for the development of the Encyclopedia and SCREEN
Showing 18 of 18 results
- SCREEN — sourceSCREEN is a web-based visualizer for the ENCODE Registry of cCREs. Users can search for cCREs by genomic region or by associated features such as genes and SNPs, and can also visualize associated underlying annotations from the ground and integrative levels of the ENCODE Encyclopedia such as gene expression, TF ChIP-seq peaks, chromatin states, and cCRE-target gene links. Additionally, users can access ENCODE data on the functional characterization of cCREs.
- ChromImpute — sourceChromImpute is software for large-scale systematic epigenome imputation. ChromImpute takes an existing compendium of epigenomic data and uses it to predict signal tracks for mark-sample combinations not experimentally mapped or to generate a potentially more robust version of data sets that have been mapped experimentally. ChromImpute bases its predictions on features from signal tracks of other marks that have been mapped in the target sample and the target mark in other samples with these features combined using an ensemble of regression trees.
- Avocado — sourceAvocado is a multi-scale deep tensor factorization method for learning a latent representation of the human epigenome. The purpose of this model is two fold; first, to impute epigenomic experiments that have not yet been performed, and second, to learn a latest representation of the human epigenome that can be used as input for machine learning models in the place of epigenomic data itself.
- bedToBigBed — sourcebedToBigBed takes a standard bed file or a non-standard bed file with associated .as file to create a compressed bigBed version. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891.Software type: file format conversion
- BEDTools — sourceCollectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetics: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, and VCF.Software type: file format conversion
- Factorbook — sourceFactorbook is a transcription factor (TF)-centric web-based repository of integrative analysis associated with ENCODE ChIP-seq data. It includes de novo discovered motifs, chromatin features surrounding ChIP-seq peaks (histone modification patterns, DNase I cleavage footprints, and nucleosome positioning profiles), deep-learned models of sequence features driving TF binding, and integration with GWAS variants and the ENCODE Registry of candidate cis-regulatory elements.Software type: database
- HaploReg — sourceExplores annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Under Set Options tab, set Browse ENCODE button to "on" and select an LD threshold and reference population. Under Build Query Tab, enter a SNP (rsXXXXX), a set of SNPs, a genomic region, or select a GWAS from the drop down menu. HaploReg returns SNPs in LD with query SNPs, their frequency in 4 populations from 1000 Genomes Phase1, and also tells you what evidence ENCODE has found for regulatory protein binding (mouse over to see the protein names), chromatin structure (mouse over to see the cell types with DNase hypersensitivity), the chromatin state of the region (the chromatin state can predict an enhancer or promoter), and putative transcription factor binding motifs that are altered by the variant. Clicking on the SNP name hyperlink reveals further details, including cell type metadata and the mechanism of disruption/creation of TF binding regulatory motifs (showing the PWM matched and its alignment to the local sequence context). SNPs are also intersected with cross-species conserved elements, chromatin states from the Roadmap Epigenomics Consortium, and lead eQTLs from the GTEx Project browser.Software type: database, variant annotation
- Genomedata — sourceEfficiently stores multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. Utilities have also been developed to load data into this format. A reference implementation in Python and C components is available under the GNU General Public License.
- Segway — sourceUses a machine learning method to analyze multiple tracks of functional genomics data, searching for recurring patterns. The software automatically partitions the genome into non-overlapping segments and assigns each segment a label. The resulting annotation provides a human-interpretable summary of the functional landscape of the genome, yielding hypotheses about novel instances or classes of functional elements.Software type: genome segmentation
- Wiggler — sourceProduces normalized genome-wide signal coverage tracks from raw read alignment files. Allows pooling of replicate datasets while allowing for replicate and data-type specific read shifting and smoothing parameters. It can be used to generate signal density maps for ChIP-seq, DNase-seq, FAIRE-seq and MNase-seq data. Wiggler also implicitly models variability in mappability to appropriately normalize signal density and distinguish missing data from true zero signal.