T12 – Single-cell RNA-Seq Data Analysis
Date: Sunday, September 9, 2018
Time: 9:00 – 12:30
Dr. Panagiotis Papasaikas 1,3
Dr. Atul Sethi 1,2,3
- Friedrich Miescher Institute for Biomedical Research (FMI), Basel, Switzerland.
- University Hospital Basel, University of Basel, Switzerland
- Swiss Institute of Bioinformatics, Switzerland
Over the last few years, Single-cell RNA-sequencing (scRNAseq) has emerged as a revolutionary tool with the potential to explore biological heterogeneity at the most basic level of organismal organization. Several questions inaccessible in the context of bulk RNAseq can now be addressed or at least probed in a meaningful manner. Examples include the possibility for detailed mapping of the cellular composition of tissue types and identification of novel cell-types, characterization of their individual transcriptomes, discrimination of transcriptional variation arising at the single-cell versus the cell-ensemble level and highlighting the contribution of individual cells to tissue differentiation, development and disease progression.
This promise comes along with multiple technical and computational challenges. While scRNAseq data is structurally similar to bulk RNA-seq, the paucity of starting material combined with multiple confounded sources of variance result in low signal to noise ratio exemplified by high abundance of zeroes in the gene expression matrices. In this new setting, the existing techniques need to be modified or novel approaches need to be developed for downstream analyses.
The proposed tutorial will broadly be divided in two parts:
- A general overview of single-cell transcriptomics, and single-cell based sequencing technologies. In particular, we’ll discuss the limitations of bulk workflows that can be overcome with single-cell analyses, as well as the advantages and limitations of single-cell analyses in gathering quantitative data.
- A practical session highlighting several of the most critical and common issues associated with the computational analysis of scRNAseq data:
- Characteristics, comparison and limitations of scRNAseq data generated from the different protocols and commercially available systems
- Data pre-processing and quality control
- Data visualization and detection of biologically meaningful subpopulations
- Differential gene expression
- Sources of biological and technical variation and circumvention of confounding effects.
This tutorial is designed as a guided conversation through scRNAseq analyses combining lecture and hands-on sessions. It intends to give audience a feel for the data and walk them through major analyses techniques and concepts using illustrative examples and R-scripts that are applicable/extendable to most commonly available types of scRNAseq data.
The learning outcomes of this tutorial for the audience:
- Gain basic knowledge about scRNAseq protocols and kind of data produced data by them.
- Perform basic QC, filtering (reads, cells, genes), and normalization of scRNAseq data.
- Detect possible sources of technical and biological confounding variables (e.g. library complexity, cell cycle, etc.). Apply techniques to remove or account for these confounders in subsequent analyses and evaluate their strengths and weakness.
- Identify scRNAseq specific challenges in visualization and clustering for subpopulation detection, and population marker identification.
- Implement aforementioned concepts with practical examples from publicly available scRNAseq datasets using custom R-scripts provided in the tutorial.
- Evaluate the applicability of specific tools on different data types and problem settings/contexts.
Computational biologists, bioinformaticians, and molecular biologists involved in transcriptomic data-analysis with any level of experience and an interest in the analysis of scRNAseq data. Working knowledge of R and RNA-seq data analyses is assumed. R-scripts will be provided for the hands-on session to allow for discussion on concepts and challenges in the field.
- Participants bring their own wifi-enabled laptops and connect to an R server set-up by FMI to run analyses.
- Internet access for accessing the R server and accessing the prepared datasets and code.
- 45 min. theoretical introduction – motivation, technologies, state-of-the art
- 2 hours hands-on session
- Initial data exploration: characteristics of expression data, quality control, filtering genes and cells, visualization of scRNAseq data.
- Identification and removal of confounding factors e.g. cell cycle, cell complexity, etc.
- Clustering and defining cell populations.
- Differential gene expression.
- 15 min. Present scRNAseq data and software repositories, further reading
We recently organized an scRNAseq data analyses tutorial at Basel Computational Biology Conference, BC2 2017 (https://ppapasaikas.github.io/BC2_SingleCell/ ). The proposed tutorial will be an updated version of the course taught at BC2.