Statistics Seminar: Dr. Zuoheng Wang
Speaker: Dr. Zuoheng Wang, Yale University
Title: Gene Graph-based Imputation for scRNA-seq Data
Single-cell RNA sequencing (scRNA-seq) technology provides higher resolution of gene expression to study the cellular level expression heterogeneity in different tissues. However, one major challenge in scRNA-seq data analysis is the low capture efficiency that results in a large proportion of zero in the data matrix. For genes with low or moderate expression, this leads to unreliable reads that may obstruct downstream analysis. We propose G2S3, a gene-graph based imputation method which borrows the information across neighboring genes on the graph to denoise the expression data and filling the dropout. G2S3 learns a sparse graph structure from each gene’s expression profile under the assumption that biological signal changes smoothly between genes closely residing on the graph. We then harness this gene network to impute the data matrix by construct a network-based diffusion process. We demonstrated through real data based on down-sampling experiments that G2S3 can accurately recover the true expression level, improve clustering results of cell populations and differential expression analysis. G2S3 can also restore the gene-gene regulatory relationship which might be obscured by the dropouts. Lastly, G2S3 is computationally efficient for large scRNA-seq datasets and is able to impute data with hundreds of thousands of cells which has become available with the advance of sequencing technology.
Open to all
Department of Mathematical Sciences