Single-cell RNAseq (scRNAseq) technology provides the ability to characterise individual cell types from heterogeneous samples, enabling gene expression profiling at deep resolution. In addition to limitations derived from short read sequencing, most scRNAseq technologies only resolve the 3’ end of RNA transcripts, making it difficult or even impossible to characterise full molecules. Long-read sequencing platforms offer a solution to these problems, but come with a higher sequencing error-rate than Illumina sequencing and, consequently, make it difficult to accurately demultiplex cell barcodes and unique molecular identifiers (UMI) from scRNAseq data.
Here, we describe an unsupervised method to demultiplex full-length transcripts from single cells using Oxford nanopore sequencing. By vectorising the raw signal corresponding to the cell barcodes with a reference set using dynamic time warping, a growing neural gas can be used to map the topology in N-dimensional feature space and cluster reads accordingly. A similar process for UMIs can generate a more accurate consensus sequence of the transcript, thus overcoming two significant technical limitations, and increasing the resolution and accuracy of scRNAseq.