Streamlined RNA Capture for Gene Expression and Novel Fusions
App Note / Case Study
Published: February 29, 2024
Credit: Twistbio
While total RNA sequencing (RNA-seq) offers an impartial glimpse into cellular transcription, analysis can be overcomplicated in the presence of highly abundant non-coding transcripts such as ribosomal RNA, intronic sequences originating from pre-mRNA, and potential genomic DNA contaminants.
This application note presents capture sequencing using a novel RNA exome panel as a powerful approach to specifically target the transcriptome to profile gene expression and discover novel fusion genes.
Download this app note to learn more about:
- The “exon-aware” design strategy
- How exome capture compares to whole transcriptome sequencing and 3’-counting
- Targeted RNA sequencing on low-mass inputs and FFPE samples
Efficient, Exon-Aware RNA Capture for Gene Expression and Novel Fusions TECH NOTE INTRODUCTION Total RNA sequencing (RNA-seq) provides a relatively unbiased view of the transcriptional state of a population of cells. However, most total RNA-seq experiments must contend with a large number of reads that are not helpful for gene-expression analysis, including reads from highly abundant non-coding transcripts like ribosomal RNA, intronic reads from pre-mRNA, or contaminating genomic DNA. Target enrichment provides a way to focus sequencing on the informative parts of the genome, allowing for more sensitive detection of lowabundance transcripts, or for profiling only specific genes of interest. Here we present capture sequencing experiments using Twist’s new RNA Exome panel, which uses a novel design strategy to specifically target every protein-coding isoform in Gencode v41 Basic. Although the design natively targets the transcriptome, our design strategy also places probes to minimize design bias and allow for discovery of novel fusion genes. We evaluate panel performance in expression quantification, showing that relative transcript abundances are preserved after hybrid capture. This allows for accurate and reproducible quantification of transcripts that are present across many orders of magnitude. We show gains in sequencing efficiency from our targeted approach and demonstrate the ability to capture novel structural variants, such as RNA fusions common in cancers. Additionally, we discuss our bioinformatic approach to evaluating capture performance for RNA, and discuss specific challenges in the analysis of RNAseq experiments. In summary, we provide evidence that the Twist Targeted Enrichment for Gene Expression solution is an effective way to efficiently profile gene expression and detect gene fusions. Figure 1. (A) Schematic of gene fusion detection using naive tiling across RNA transcripts. Dotted lines indicate mismatches (B) Schematic of gene fusion detection with the exon-aware tiling strategy. RESULTS DESIGN STRATEGY AND CONTENT Our first step in generating the RNA exome was to decide on both a content curation strategy and a strategy for how we would design capture probes against a transcript. Content curation was performed using the GenCode gene definitions (v41 on hg38)—our aim was to focus our design on the coding regions of protein-coding genes. To this end, we pared down the total defined coding sequence (CDS) space in GenCode to categories of genes that were either protein-coding or with strong evidence for coding content in certain situations. In addition, we covered the 3’ and 5’ untranslated regions of some genes (such as those involved in recurrent fusions) to ensure that the panel had maximum sensitivity to these events. From these genes, we chose to tile a set of well-described transcript models, with the aim of natively covering the majority of isoforms that are of general interest to most researchers. Importantly, the content selected from these transcript models constitute the set of high-confidence exons within these genes. To avoid capturing either contaminating genomic DNA, or pre-mRNA, we did not target flanking intronic sequences of genes. We thus decided on a tiling strategy that directly targeted the mature mRNA forms of transcripts. The naive approach to covering these transcripts would be to tile them with probes end-to-end (Figure 1A). However, this has the drawback of biasing capture towards known isoforms, and biasing capture against fusion transcripts. Instead, we employed a new “exon-aware” design strategy that avoids placing probes across exon-exon boundaries (Figure 1B). By doing this, we can ensure that novel isoforms or fusion transcripts can be detected efficiently, as the probes do not select for known exon-exon junctions. After tiling the design using the exon-aware strategy above, we collapsed exact duplicate probes and removed probes with low-sequence complexity and/or homology towards non-coding RNAs that would reduce sequencing efficiency (i.e., mitochondrial and nuclear ribosomal RNAs and tRNAs). With this design finalized, we used Twist’s DNA printing technology to synthesize our probes using our standard target enrichment panel process.2 DOC-001385 REV 1.0 EXON-AWARE RNA CAPTURE COMPARISON OF EXOME CAPTURE TO WTS AND 3’-COUNTING In addition to targeted sequencing, the common workflows for assessing gene expression are whole transcriptome sequencing (WTS), which uses random priming to select a relatively unbiased set of transcripts from ribosomal-depleted RNA, and 3’-counting, which uses an oligo-dT primer to isolate the 3’-ends of polyadenylated mRNA transcripts (primarily mRNAs). Broadly the benefit of performing WTS is that the user gets a relatively unbiased view of the transcriptome, at the expense of losing a substantial number of reads to introns and other relatively uninformative areas of the genome (Figure 2A). Correspondingly, 3’-counting is more efficient at selecting exonic regions (CDS and UTR), but displays a strong bias towards the 3’-ends of transcripts (Figure 2B). This would be expected to impact the ability to detect different isoforms, as only part of the transcript is profiled for longer genes. To address these issues, we designed the RNA exome panel to profile the entire CDS of protein coding transcripts, which achieves a measured 3’ bias and duplicate rate similar to what is observed with WTS (Figure 2B). We also carefully excluded intronic and highlyexpressed non-coding sequences from the design, which allows us to focus reads more efficiently into exons than either WTS or 3’-counting (Figures 2A, 2B). Selecting for mature transcripts by hybrid capture had other advantages as well—the percentage of reads derived from the incorrect strand was reduced compared to either 3’-counting or WTS (Figure 2B). Since transcripts exist in concentrations that span roughly 6 orders of magnitude, we next asked whether RNA hybrid capture was equally efficient in enriching for both low- and highly-expressed transcripts. To do this, we correlated the counts from a WTS run to the counts obtained from an RNA exome capture in the same sample type (Figure 2C). We found that enrichment was consistent across the entire range of expression, indicating that the capture system was not saturated even for highly expressed transcripts. Figure 2A. Genomic distribution of reads in RNA exome capture, whole transcriptome sequencing (WTS), and 3’-counting. Figure 2C. Correlation of uncaptured counts (x-axis) to captured counts (y-axis) for protein-coding genes. Figure 2D. Correlation of two technical replicate captures performed with the RNA exome. Figure 2B. Comparison of sequencing metrics between RNA exome capture, WTS and 3’-counting. 3 DOC-001385 REV 1.0 PERFORMANCE OF THE TWIST RNA EXOME ON LOW-MASS INPUTS AND FFPE SAMPLES Formalin-fixed paraffin-embedded (FFPE) tissue is tissue that has been preserved for histology. Although this process damages nucleic acids, FFPE tissue is nonetheless often used for RNA-seq because the samples are readily available as clinical specimens. As previous applications of RNA capture in the literature have focused on FFPE samples (Jang et al 2021, Pennock et al 2019, Vahrenkamp et al 2019), we evaluated the performance of the RNA exome on FFPE samples at three different mass inputs (1 ng, 10 ng and 100 ng). Since the RNA exome selects efficiently for coding content, we also examined 5 levels of read sampling between 10M and 30M reads for both WTS and RNA exome to establish an approximate equivalence between the reads required to detect a particular number of genes in each workflow. We looked both at coding genes as detected by alignment and standard feature counting (Figure 3A), as well as the number of detected isoforms using a k-mer based approach (Figure 3B). In both cases, a cutoff of 5 supporting reads was used to define detection. Our results show that the RNA exome dramatically improves the number of detected coding genes and transcripts at all mass levels. We find at high mass inputs that we detect similar numbers of coding genes with 15M sampled reads compared to a WTS sample with 30M reads. The results were particularly striking for 1 ng of FFPE input, where the TE sample detected comparable numbers of coding genes as higher input quantities in the TE sample, while even 30M reads in the WTS workflow was unable to detect approximately 1,000 low-expressed genes that were measurable with TE (Figure 3A). The patterns for the number of detected transcripts were similar, with a measurable increase at all levels of read sampling that was more striking for low mass inputs. For 1 ng of FFPE, we were able to detect a comparable number of transcripts with 10M reads with the RNA exome as were detected with 30M reads using WTS (Figure 3B). Since FFPE RNA tends to be highly fragmented, we asked whether target enrichment might be able to select for a subset of less degraded sequences. We sequenced an FFPE sample using both whole transcriptome sequencing and the RNA exome, and plotted the inferred size distribution of fragments based on alignment directly to RNA transcripts. The size distribution showed a clear upward shift for the RNA exome sample (Figure 3C) indicating that the RNA exome does indeed select for more intact material. Finally, we wanted to get a sense of how robust RNA exome capture was to FFPE material of differing quality. We thus extracted RNA from 5 commercially available FFPE standards, and subjected these samples to both whole-transcriptome sequencing and capture with the RNA exome. We find that the RNA exome is able to significantly increase the number of detected genes for all tested samples, irrespective of their level of degradation (Figure 3D). EXON-AWARE RNA CAPTURE NUMBER OF DETECTED CODING GENES MILLIONS OF READS MILLIONS OF READS MILLIONS OF READS 1 ng FFPE RNA Input 16,000 15,000 14,000 13,000 12,000 11,000 10,000 10 15 25 30 20 10 15 25 30 20 10 15 25 30 20 10 ng FFPE RNA Input 100 ng FFPE RNA Input RNA EXOME WTS RNA EXOME WTS RNA EXOME WTS MILLIONS OF READS MILLIONS OF READS MILLIONS OF READS NUMBER OF DETECTED CODING TRANSCRIPTS 60,000 50,000 40,000 30,000 20,000 10,000 0 10 15 25 30 20 10 15 25 30 20 10 15 25 30 20 1 ng FFPE RNA Input 10 ng FFPE RNA Input 100 ng FFPE RNA Input RNA EXOME WTS RNA EXOME WTS RNA EXOME WTS Figure 3A. Number of protein coding genes detected (y-axis) with different levels of downsampling (x-axis) in 3 different mass inputs of FFPE material. Figure 3B. Number of protein coding transcripts (including different isoforms of the same gene) detected (y-axis) with different levels of downsampling (x-axis) in 3 different mass inputs of FFPE material. INFERRED INSERT SIZE (BP) FRACTION OF READS 0.008 0.006 0.004 0.002 0.000 0 50 100 150 200 250 300 350 400 RNA EXOME WTS Figure 3C. Size distributions of captured (RNA exome) and uncaptured (WTS) reads from an FFPE sample. NUMBER OF DETECTED GENES 15,000 14,500 14,000 13,500 13,000 12,500 Sample H1 (DV200 = 57) Sample H2 (DV200 = 60) Sample H3 (DV200 = 67) Sample H4 (DV200 = 73) Sample H5 (DV200 = 76) RNA EXOME WTS Figure 3D. Comparison of captured (RNA exome) and uncaptured (WTS) counts for a variety of FFPE samples with different integrity.4 DOC-001385 REV 1.0 DIFFERENTIAL EXPRESSION ANALYSIS WITH THE
Download this App Note for FREE Below
Information you provide will be shared with the sponsors for this content.
Technology Networks or its sponsors may contact you to offer you content or products based on your interest in this topic. You may opt-out at any time.