T13 – Modern and scalable tools for efficient analysis of very large metagenomic datasets
Date: Saturday September 8, 2018
Time: 9:00 – 17:00
Venue: Stavros Niarchos Foundation Cultural Center
Room: A_Computer Room & Maker Space
This tutorial is aimed at bioinformatics practitioners with experience in command line usage and scripting, who are interested to learn about powerful tools for the efficient analysis of even very large metagenomic datasets. More than 50% of this workshop will involve hands-on exercises.
The amount of data generated by metagenomics is growing rapidly, making the data analysis the main bottleneck to get to novel biological insights. The goal of this tutorial is to introduce modern bioinformatic tools and pipeline construction methods that will enable you to efficiently cope with the enormous amount of metagenomic data through modular and reproducible, workflow-based analysis.
We will first give a summary of metagenomic tools for assembly, binning and taxonomic profiling in a comprehensive way by reviewing the results from the CAMI challenge . This should give you a taste of which tools fit best in your own projects. We will then introduce the Common Workflow Language (CWL ), which allows you to build reproducible and flexible metagenomic workflows.
In the afternoon session, we will train you in efficient metagenomic data analysis on the protein level using the MMseqs2 software suite. Exercises will cover different topics including efficient protein-level assembly, ultra-fast ORF clustering, sensitive homology search as well as building goal-specific custom pipelines. You will learn by hands-on exercises how to build your own efficient workflows in MMseqs2 by combining its various modules.
: Sczyrba et al. (2017). Critical Assessment of Metagenome Interpretation—a benchmark of metagenomics software. Nature methods, 14(11), 1063, https://www.nature.com/articles/nmeth.4458.
: Steinegger, M., & Söding, J. (2017). MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, 35, 1026–1028, https://www.nature.com/articles/nbt.3988.
|09:00 – 09:15||Intro to shotgun metagenomics & applications||Sczyrba|
|09:15 – 09:45||Critical Assessment of Metagenome Interpretation: Results of the 1st CAMI Challenge||Sczyrba|
|09:45 – 10:30||CWL Introduction, Metagenomics Pipeline||Henke|
|10:30 – 11:00||Coffee break|
|11:00 – 12:30||Hands-on: Building your own CWL pipeline||Henke &
|12:30 – 13:30||Lunch|
|13:30 – 14:00||MMseqs2 principles & algorithms||Söding|
|14:00 – 15:00||Hands-on: standard workflows in MMseqs (assembly, clustering, annotation)||Mirdita, Galiez, Söding|
|15:00 – 15:30||Coffee break|
|15:30 – 17:00||Hands-on: expert tools, how to build custom workflows in MMseqs2 (e.g. abundance analysis)||Mirdita, Galiez, Söding|
|17:00||End of workshop|
Intended audience and possible prerequisites
Bioinformaticians experienced in command line usage and basic scripting.
Material or infrastructure required
You will need to bring your own notebook with an SSH client installed (Linux or MacOS systems already have one, on Windows you can install e.g. PuTTY).
For the morning session, you will use the de.NBI Cloud infrastructure.
For the afternoon session, you will either run software on your own notebook (preferred). In this case you need a notebook with VirtualBox installed and at least 8GB of RAM and 12GB of free disk space. Or you will run on the de.NBI cloud.
Contact information for the organizer
Alexander Sczyrba (email@example.com, Bielefeld University, Germany)
Christian Henke (firstname.lastname@example.org, Bielefeld University, Germany)
Clovis Galiez (email@example.com, MPI for Biophysical Chemistry, Germany)
Milot Mirdita (firstname.lastname@example.org, MPI for Biophysical Chemistry, Germany)
Johannes Soeding (email@example.com, MPI for Biophysical Chemistry,, Germany)