Additional details on data generation and analysis can be found in the paper supplement.
1. Protein-Coding Gene Annotation
The human, worm and fly protein-coding gene annotation are from GENCODE 10, extensions of WormBase WS220 and FlyBase 5.45, respectively.
- Human protein-coding gene annotation, in gtf format, from GENCODE v10: gen10_CDS+exons_only_protein-coding_only.gtf.gz
- Worm protein-coding gene annotation, in gtf format, from modENCODE June 2012 freeze: AG1201.integrated_transcripts_strictly_coding.ws220.gtf.gz
- Fly protein-coding gene annotation, in gtf format, from modENCODE June 2012 freeze: coding_Celniker_Drosophila_Annotation_20120616_1428.gtf.gz
2. Fly Strict Non-coding Gene
The fly non-coding gene annotation is developed beyond FlyBase 5.45.
- Fly strict non-coding annotation, in gtf format, from modENCODE June 2012 freeze: strict_noncoding_Celniker_Drosophila_Annotation_20120616_1428.gtf.gz
3. Comparable and Non-comparable Non-coding RNA Annotations
The compressed GTF files for non-coding RNA gene annotations. For each species, there is one compressed file that contains the comparable (miRNA, tRNA, snoRNA, snRNA, pri-miRNA) and one of non-comparable (between organism) ncRNA annotations. The comparable annotations are further separated into the biotypes.
- Human comparable ncRNA, in gtf format: human_consensus_ncRNAs_03_23_2013.gtf.tar.bz2
- Human non-comparable ncRNA, in gtf format: human_non_consensus_ncRNAs_03_23_2013.gtf.tar.bz2
- Worm comparable ncRNA, in gtf format: worm_consensus_ncRNAs_03_23_2013.gtf.tar.bz2
- Worm non-comparable ncRNA, in gtf format: worm_non_consensus_ncRNAs_03_23_2013.gtf.tar.bz2
- Fly comparable ncRNA, in gtf format: fly_consensus_ncRNAs_03_23_2013.gtf.tar.bz2
- Fly non-comparable ncRNA, in gtf format: fly_non_consensus_ncRNAs_03_23_2013.gtf.tar.bz2
4. Human-Worm-Fly Ortholog Lists
We have compiled a complete list of ~28k triplets of orthologous genes among human, worm and fly (6353 unique genes in human, 5083 unique genes in worm, 4839 unique genes in fly). The list was merged from the MIT list and Ensembl. It contains all one to one, one to many and many to many orthologous relationships.
- MIT Human-Worm-Fly Orthologs: Modencode.merged.orth20120611_wfh_comm_all.csv
5. Table Summarizing Processed Expression Values for all Annotated Genes
These tables provide a summary of all annotations with processed expression values associated to protein coding genes in human, worm, and fly. These tables also include TF prediction power, orthology etc. Details of the values and features are provided in the excel sheet headers.
- Human coding gene details, in Excel format: human_gene.xlsx
- Worm coding gene details, in Excel format: worm_gene.xlsx
- Fly coding gene details, in Excel format: fly_gene.xlsx
6. Transcriptionally Active Regions (TARs)
TARs refer to the non-canonical transcription in the regions excluding protein-coding exons, annotated ncRNAs and pseudogenes. Listed below are all the TARs locations with 90% and 98% exon discovery rate thresholds in the genome of each species, using the chromosome, start and stop. Details of TAR calling are in supplement.
- TARs in human at 90% threshold, in bed format: human_exon_disc_90_tars.bed
- TARs in human at 98% threshold, in bed format: human_exon_disc_98_tars.bed
- TARs in worm at 90% threshold, in bed format: worm_exon_disc_90_tars.bed
- TARs in worm at 98% threshold, in bed format: worm_exon_disc_98_tars.bed
- TARs in fly at 90% threshold, in bed format: fly_exon_disc_90_tars.bed
- TARs in fly at 98% threshold, in bed format: fly_exon_disc_98_tars.bed
Human enhancers are from Yip et al. “Classification of genomic regions based on experimentally-determined binding sites of more than 100 transcription-related factors in the whole human genome”. Genome Biol 13: R48. Worm and fly enhancers are identified using enhancer specific histone marks, see Ho et al. 2014.
- Human enhancers used for TAR analysis: Enhancers
- Worm and fly enhancers used for TAR analysis: Enhancers
- Alternative human enhancers: Enhancers
8. Clustering of ncRNAs and TARs with Co-expression Modules
For each species, we mapped the ncRNAs and TARs to modules based on co-expression correlations, and found those highly mapped ncRNAs may have related functions with modular genes so that we can annotate them based on modular functions.
- incRNAs and TARs associated with the 16 modules in three species, tarball of txt files: 16_module_ncRNA.tar.gz
9. Supervised ncRNA predictions (novel ncRNA fragments)
We applied the machine learning method, incRNA (Lu et al. “Prediction and characterization of noncoding RNAs in C. elegans by integrating conservation, secondary structure, and high-throughput sequencing and array data”. Genome Res. 21:245-54) to predict ncRNAs in the genomes of human, worm and fly.
- Human supervised ncRNA predictions Feb 6 , 2013, in bed format: hg_incRNA_tar98_intersection_50_6Feb13.bed
- Worm supervised ncRNA predictions Feb 6 , 2013, in bed format: ce_incRNA_tar98_intersection_50_6Feb13.bed
- Fly supervised ncRNA predictions Feb 6 , 2013, in bed format: dm_incRNA_tar98_intersection_50_6Feb13.bed
10. Gene Co-expression Modules
We built co-expression modules by combining across-species orthology and within-species co-expression relationships between protein coding genes. In the resulting multilayer network we searched for dense subgraphs, using simulated annealing. We used the Orthoclust methodology, see Yan et al. Genome Biol. (2014) 15:R100. To focus on the cross-species conserved functions, we restricted the clustering to orthologs, arriving at 16 conserved modules, which are enriched in a variety of functions, ranging from morphogenesis to chromatin remodeling.
- 16 human, worm and fly co-expressed and co-evolved modules showing highly coordinated expression patterns only during phylotypic stage, tarball of csv files: 16_module.tar.gz
- Gene names in 16 conserved modules: genelist_16modules.xlsx
- Enriched Gene Ontology terms on biological process in 16 conserved modules: GO_16modules_biological_process.xlsx
- Enriched Gene Ontology terms on cellular component in 16 conserved modules: GO_16modules_cellular_component.xlsx
- Enriched Gene Ontology terms on molecular function in 16 conserved modules: GO_16modules_molecular_function.xlsx
11. Developmental Stage Mapping between Worm and Fly
We used expression patterns to align the stages in the worm and fly development, finding a novel pairing between the worm embryo and fly pupa stages, in addition to the expected embryo-to-embryo and larvae-to-larvae pairings. See Li et al., Genome Res. (2014) 24:1086-101. for more details.
- Embryonic specific worm genes aligned with fly genes in both embryo stage and pupae stage: wf_dual_mapping.xls
12. Raw Data for the “Comparative Analysis of the Transcriptome across Distant Species”
Description of all datasets in different formats, and links to raw data (reads).