ENCODE Software
Software tools used in integrative analysis for the development of the Encyclopedia and SCREEN
Showing 12 of 12 results
Number of displayed results:
- bedToBigBed — sourcebedToBigBed takes a standard bed file or a non-standard bed file with associated .as file to create a compressed bigBed version. Description of Big Binary Indexed (BBI) files and visualization of next-generation sequencing experiment results explained by W.J. Kent, PMCID: PMC2922891.Software type: file format conversion
- WGBS output processor — sourceConvert a Bismark CX_report file to bed-like filesSoftware type: other
- BEDTools — sourceCollectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetics: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, and VCF.Software type: file format conversion
- Factorbook — sourceFactorbook is a transcription factor (TF)-centric web-based repository of integrative analysis associated with ENCODE ChIP-seq data. It includes de novo discovered motifs, chromatin features surrounding ChIP-seq peaks (histone modification patterns, DNase I cleavage footprints, and nucleosome positioning profiles), deep-learned models of sequence features driving TF binding, and integration with GWAS variants and the ENCODE Registry of candidate cis-regulatory elements.Software type: database
- HaploReg — sourceExplores annotations of the noncoding genome at variants on haplotype blocks, such as candidate regulatory SNPs at disease-associated loci. Under Set Options tab, set Browse ENCODE button to "on" and select an LD threshold and reference population. Under Build Query Tab, enter a SNP (rsXXXXX), a set of SNPs, a genomic region, or select a GWAS from the drop down menu. HaploReg returns SNPs in LD with query SNPs, their frequency in 4 populations from 1000 Genomes Phase1, and also tells you what evidence ENCODE has found for regulatory protein binding (mouse over to see the protein names), chromatin structure (mouse over to see the cell types with DNase hypersensitivity), the chromatin state of the region (the chromatin state can predict an enhancer or promoter), and putative transcription factor binding motifs that are altered by the variant. Clicking on the SNP name hyperlink reveals further details, including cell type metadata and the mechanism of disruption/creation of TF binding regulatory motifs (showing the PWM matched and its alignment to the local sequence context). SNPs are also intersected with cross-species conserved elements, chromatin states from the Roadmap Epigenomics Consortium, and lead eQTLs from the GTEx Project browser.Software type: database, variant annotation
- Genomedata — sourceEfficiently stores multiple tracks of numeric data anchored to a genome. The format allows fast random access to hundreds of gigabytes of data, while retaining a small disk space footprint. Utilities have also been developed to load data into this format. A reference implementation in Python and C components is available under the GNU General Public License.
- Segway — sourceUses a machine learning method to analyze multiple tracks of functional genomics data, searching for recurring patterns. The software automatically partitions the genome into non-overlapping segments and assigns each segment a label. The resulting annotation provides a human-interpretable summary of the functional landscape of the genome, yielding hypotheses about novel instances or classes of functional elements.Software type: genome segmentation
- Wiggler — sourceProduces normalized genome-wide signal coverage tracks from raw read alignment files. Allows pooling of replicate datasets while allowing for replicate and data-type specific read shifting and smoothing parameters. It can be used to generate signal density maps for ChIP-seq, DNase-seq, FAIRE-seq and MNase-seq data. Wiggler also implicitly models variability in mappability to appropriately normalize signal density and distinguish missing data from true zero signal.