Spatial analysis of gene expression patterns has been a key technique for revealing the potential functions of genes. Traditionally, these analyses conducted using in-situ hybridizations and other labor-intensive protocols were constrained to examining only a few candidate genes per sample. However, the advent of spatial transcriptomic techniques like Slide-seqV2 has transformed this field, enabling massively parallel exploration of gene expression patterns within their tissue contexts by pairing spatial locations with RNA sequencing. Despite its potential, Slide-seqV2 datasets often produce fewer usable reads than expected. We have identified that a significant source of errors in the technology stems from the chemical synthesis of barcodes used in Slide-seqV2. These errors are systematic, and in many cases, they can be bioinformatically identified and corrected. We have developed “Syrah, ” an analysis pipeline that identifies and corrects barcode errors in Slide-SeqV2 and Curio seeker datasets. Syrah can dramatically enhance read numbers in Slide-seqV2 datasets, recovering up to 35% more reads, reassigning erroneous barcode matches, and removing improperly formed reads. Unlike other dataset improvement methods that rely on data driven imputation, Syrah uses a biochemical model and the barcode sequence data and does not, require additional datasets or intricate calculations. This innovative technique promises to transform the utility of Slide-seqV2 and Curio Seeker datasets by identifying usable reads that were discarded during previous analysis that required exact matching of barcode sequences.
Address reprint requests to: Alejandro Sánchez Alvarado
CITATION
Brewster C, Mann FG Jr, Benham-Pyle B, Sánchez Alvarado A. Syrah: a pipeline to maximize spatial transcriptomics data output. G3 (Bethesda). 2026 May 4:jkag107. doi: 10.1093/g3journal/jkag107. Epub ahead of print. PMID: 42081452