Background: Intron retention (IR), where specific introns are retained in polyadenylated mature messenger RNA, is a mode of alternative splicing that modulates gene expression in normal physiology and cancer, including haematological malignancies. Existing applications (e.g. IRFinder, MISO, rMATS) measure differential IR but none offer a streamlined approach for IR analyses of high numbers of samples in publicly available RNA-sequencing databases.
Method: We created NxtIRF, a versatile R package designed to streamline differential IR analysis downstream to our existing in-house algorithm, IRFinder1. NxtIRF uses an enhanced algorithm to reduce false over-calling of IR, and create data structures to analyse ≤1000 samples. NxtIRF also streamlines data visualization including volcano plots, principal component analysis (PCA), heatmap generation, hierarchical clustering, and annotation track representation of IR events. Additionally, we outline an algorithm that summates transcriptome-wide IR, a measurement used to stratify samples for IR-associated differential gene expression analysis. NxtIRF and IRFinder was used to process polyA-enriched mRNA-seq from acute myeloid leukaemia (AML, n=133) and diffuse large B-cell lymphoma (DLBCL, n=48) samples from The Cancer Genome Atlas. IR levels were compared with unpaired normal bone marrow2 and lymphoid tissue3 respectively.
Outcome: Consistent with a previous report4, IR predominantly increased in AML compared to normal bone marrow. DLBCL exhibits dramatically reduced IR compared to B lymphocytes in normal lymphoid tissue. This observation is striking as breast cancer was the only tumour type previously reported to have decreased IR in tumour versus matched normals4. PCA demonstrate DLBCL expresses two distinct IR-expression signatures, suggesting different sets of parent genes are alternately regulated through reduced IR. Distinct genes are differentially expressed in DLBCL and AML when samples are stratified using our transcriptome-wide IR parameter.
Conclusion: NxtIRF facilitates downstream differential IR analysis of large data sets, ideal for bioinformatic analysis of mRNA-seq of large cancer databases to study IR.