AWARD SUMMARY for Christopher Vollmers, UCSC (R35GM133569)

Current Production

Improving Transcriptome Annotation

Project Summary/Abstract Currently, transcriptome analysis by short-read RNA-seq1 is a core component of research in nearly all fields of biology. However, even specialized RNA-seq protocols cannot comprehensively identify and quantify full-length RNA transcript isoforms. Nonetheless, isoform- level resolution is essential for transcriptome annotation of model and non-model organisms. While long-read sequencing technology has the potential to overcome this challenge, current leading approaches, such as Pacific Biosciences (PacBio)2 or Oxford Nanopore Technologies (ONT)3 long-read technologies, have their own set of drawbacks that limit their wide adoption for transcriptome annotation. For example, PacBio IsoSeq cannot generate the tens of millions of reads required to determine comprehensive isoform-level transcriptomes, while ONT cDNA and direct RNA sequencing methods cannot do so at the required accuracy4. To overcome these limitations, we propose to develop new short- and long-read sequencing and computational approaches. We are building on our experience with short-read technology5 to develop a new short-read cDNA sequencing method that can identify transcription start sites, polyA sites and splice sites with very high accuracy in a single easy-to-implement experiment. Currently, this requires three, separate and complex experiments. Further, we are significantly improving our R2C2 method, which is already among the most capable long-read full-length cDNA sequencing methods currently available6 (R2C2 improves on standard ONT approaches by increasing accuracy from 87% to 94% while producing more full-length cDNA sequences). To improve R2C2 further, we will increase its read accuracy to 98%, effective throughput by at least a factor of 2, and make it possible to capture very long cDNA molecules. To more accurately identify and quantify isoforms based on the resulting short- and long-read data, we will modify our Mandalorion3,6 isoform identification software to take full advantage of both data types. Together these advances will represent an integrated workflow for transcriptome annotation of unprecedented power. We will then apply this integrated workflow to improve transcriptome annotations of homo sapiens and commonly used model organisms. Generating high quality isoform-level transcriptome data for Homo sapiens, Rattus norvegicus, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans will create valuable resources for biomedical research and enable us to investigate how isoform diversity has evolved and is regulated.

Status: current
NIH Grant: R35GM133569
Primary Investigator: Christopher Vollmers, UCSC
Affiliated Labs

Dates: April 30, 2024 - April 30, 2024
Award RFA: community