WS 1: Interpretable Deep Learning for single-cell data analysis (full day 9 am - 4 pm) |
WS 2: Cell segmentation using KNIME Analytics Platform and its Keras/Tensorflow Integration (full day 9 am - 4 pm) |
WS 3: User-friendly and reproducible genome annotation with MOSGA and Snakemake |
WS 4: Tools for combined multi-omics analysis of microbiomes (half day 12 pm - 6 pm) |
WS 5: Title: Boost your Data Management Planning (half day 1 pm - 4 pm) |
WS 1: Interpretable Deep Learning for single-cell data analysis (full day 9 am - 4 pm)
Organizers: Sara Al-Rawi, Harald Binder, Maren Hackenberg, Moritz Hess, Martin Treppner (University of Freiburg)
Participants: max. 30
Deep generative models (DGMs) such as Variational Autoencoders (VAEs) are versatile
statistical methods for analyzing high dimensional data such as gene expression
measured in single cells (scRNA-Seq). Specifically, they allow to extract a low-dimensional
latent representation which is useful to understand the structure in the high-dimensional
data and is a prerequisite for many downstream analyses.
The flexibility of DGMs in capturing complex patterns is due to their neural network
building blocks. However, these networks (1) require careful tuning of hyperparameters
and (2) can also be regarded as opaque structures as they do not allow for a
straightforward interpretation of the latent space with respect to the expression of
individual genes.
In this workshop, we will address the two challenges mentioned above. We will give advice regarding fitting DGMs and interpreting the learned latent space in terms of the relation to the observed high-dimensional gene expressions as observed in scRNA-seq data.
Specifically, in the first part of this workshop, we will guide the participants how to choose appropriate hyperparameters of DGMs. Here, we will focus on the network architectures and parameters of the stochastic gradient descent algorithm that is typically used to fit the DGM via minimizing a loss function. In the second part of the workshop, we will demonstrate approaches for interpreting the latent representations learned by DGMs.
Specifically, we will emphasize model-based and post-hoc approaches.
We recently published methods for interpreting latent representations learned by DGMs
(Hess et al., 2020), using DGMs for sample size calculations for scRNA-seq experiments
(Treppner et al., 2021), and investigated the overfitting of various DGM approaches under
sample size constraints (Nussberger et al., 2021; Lenz et al., 2021). We also published an
overview of state-of-the-art methods for interpretability of DGMs, including accompanying Jupyter notebooks (Treppner et al., 2022).
All practicals will be held using the scientific programming language Julia which allows for easy and flexible fitting of the presented approaches.
WS 2: Cell segmentation using KNIME Analytics Platform and its Keras/Tensorflow Integration (full day 9 am - 4 pm)
Organizers: Janina Mothes, Martyna Pawletta (KNIME GmbH)
Participants: max 30
Image analysis is one of the hallmarks of biomedical research due to its wide range of potential applications. This includes enhancing our understanding of brain function by analyzing the connectivity of individual neuronal processes and synapses through serial transmission electron microscopy (EM). Machine learning approaches, in particular convolutional neural networks, allow the automatic segmentation of neural structures in EM images, an important step towards automating the extraction of neuronal connectivity.
The open source KNIME Analytics Platform offers an accessible tool based on the visual
programming paradigm to analyze diverse kinds of data, including images. In addition, one can
choose from a wide array of data transformations, machine learning algorithms, and visualizations and combine those in one reproducible workflow. KNIME Analytics Platform is freely available from https://www.knime.com/downloads.
In this hands-on tutorial, participants will produce a workflow to create and train a specific
Convolutional Network (U-Net) for segmenting cell images. We will start by importing and cleaning up the input data (Transmission Electron Microscopy data)[1,2]. Afterwards, with the help of the KNIME Keras/Tensorflow integration, we will then train a U-Net model and use the trained network to predict the segmentation of unseen data. In the last step, we visualize our results.
Learning Objectives for Tutorial - Participants will learn how to
Intended audience and level - Beginner
Students (grad/undergrad), researchers, principal investigators with an interest in machine
learning, images, data manipulation are welcome to attend the tutorial. A little background on machine learning and imaging data is a plus. We will provide a short introduction to the KNIME Analytics Platform, cell segmentation, and convolutional neural networks, before starting the hands-on sessions.
Requirements
For a hands-on tutorial, participants need to bring their own laptop. All the necessary software
and data will be made available for download before the tutorial day.
Schedule
09.00-10.30: Introduction to KNIME Analytics Platform
- Installing KNIME Analytics Platform (hands-on)
- Understanding the KNIME workbench
10.30-12.00: Data wrangling in KNIME Analytics Platform (hands-on)
- Importing/Exporting data from/to files
- Data preprocessing
12.00-13.00: Break
13.00-14.30: Introduction to working with images in KNIME
Background of the problem
- Common applications of deep learning for image analysis
- What is cell segmentation?
- Getting to know the data (we will be using the 2D EM segmentation
challenge dataset [3,4])
14.30-16.00: Cell segmentation using KNIME Analytics Platform and its Keras/Tensorflow
Integration (hands-on)
- Train the model on preprocessed data
- Apply model to predict the segmentation of unseen data
- Visualization of the Results
- Deploy the trained model
- Q&A
REFERENCES
[1]Arganda-Carreras, Ignacio, et al. "Crowdsourcing the creation of image segmentation algorithms for connectomics." Frontiers in neuroanatomy 9 (2015): 142.
[2]Cardona, Albert, et al. "An integrated micro-and macroarchitectural analysis of the Drosophila brain by computer-assisted serial section electron microscopy." PLoS Biol 8.10 (2010): e1000502.
[3]Arganda-Carreras, Ignacio, et al. "Crowdsourcing the creation of image segmentation algorithms for connectomics." Frontiers in neuroanatomy 9 (2015): 142.
[4]Cardona, Albert, et al. "An integrated micro-and macroarchitectural analysis of the Drosophila brain by computer-assisted serial section electron microscopy." PLoS Biol 8.10 (2010): e1000502.
WS 3: User-friendly and reproducible genome annotation with MOSGA and Snakemake
(half day 9 am - 12 pm)
Organizers: Dr. Roman Martin (Philipps-Universität Marburg, AG Heider | Bioinformatics /Data Science in Biomedicine)
Participants: max 15
Typically, more experienced users perform eukaryotic draft genome annotations since the barrier is high for new beginners in this field. Knowledge about the available pipelines, correct parameters, computational tools, biological relatedness, and application linkages are required. Over the years, dozens of specific genomic predictions tools were developed to identify regions of interest on a genomic sequence. Most annotation pipelines offer a fixed set of prediction tools with preconfigurations and lack either modularity or user-friendliness. This leads over time to the issue that bioinformaticians are required for repetitive genome annotations. Additionally, the resulting genome annotations are possibly biased to configurations that are adapted to a specific taxonomical clade. Multiple attempts with various configurations for better suitable annotations are usually required, leading to difficult reproducible results.
This hands-on tutorial will demonstrate how to extend the modular, scalable, reproducible, and userfriendly automated draft genome annotation framework MOSGA for de novo annotations. The framework internally uses the workflow management engine Snakemake and supports Conda, providing general scalability and reproducibility. At the same time, the web-based user interface allows easy-to-use parameterization, and the data layer ensures standardized genome database compliance and formats. We will demonstrate how users can adapt the framework to integrate new prediction tools of interest into the framework, including the user interface for parameterization and predictions into a genome browser. Furthermore, we dive a bit into the data layer, the abstraction of prediction tool outputs to perform more complex tasks such as the import of new format, conflict resolution, quality filtering, and writing output. Finally, the participants should be able to adapt the framework for the users' task while remaining an easy-to-use interface, scalability, and reproducibility.
WS 4: Tools for combined multi-omics analysis of microbiomes (half day 12 pm - 6 pm)
Organizers: Dr. Robert Heyer (Otto von Guericke University, Magdeburg); Prof. Alexander Sczyrba (Bielefeld University); Kay Schallert (Otto von Guericke University, Magdeburg); Prof. D. Benndorf (Otto von Guericke University, Magdeburg, Anhalt University of Applied Sciences, Köthen)
Participants: max 30
Knowing the taxonomic and functional composition of microbiomes and their activity is required to understand several diseases (e.g., inflammatory bowel disease), environmental processes (e.g., soil), and biotechnological applications (e.g., biogas plants). To achieve this knowledge, it is possible to analyze the microbial genes (metagenomics), transcripts (metatranscriptomics), proteins (metaproteomics), or metabolites (metabolomics). To analyze all these microbial features, researchers require, besides experimental skills, bioinformatics skills to perform data analysis and integration.
This workshop aims to show for an example microbiome the combined bioinformatics workflow for whole-genome sequencing [1], and metaproteomics analysis [2,3]. Furthermore, we will present how to map the omics features to metabolomics pathways using the MPA_Pathway_Tool [4] and perform flux balance analysis.
Provisional Schedule
12.00-13.00: Lecture: Introduction into the experimental multi-omics analysis of
microbiomes (Prof. Dirk Benndorf)
13.00-13.20: Lecture: Bioinformatics challenges for metagenome analysis (Prof. Sczyrba)
13.20-14.45: Hands-on: Bioinformatics assembly of reads to metagenome-assembled
genomes and their taxonomic and functional annotation (Prof. Sczyrba)
14.45-15.10: Short break
15.10-15.30: Lecture: Bioinformatics challenges for metaproteomics analysis
(Dr. Robert Heyer)
15.30-16.30: Hands-on: Identification of mass spectrometry data using the
MetaProteomeAnalyzer (Kay Schallert)
16.30-16.50: Lecture: Integration of multi-omics data from microbiomes (Dr. Robert Heyer)
16.50-17.50: Hands-on: Flexible creation of metabolic networks using the
MPA_Pathway_Tool, mapping of omics features to these pathways, and flux
balance analysis (Dr. Robert Heyer)
17.50-18.00: Closing
REFERENCES
[1] Sczyrba, A., Hofmann,P., Belmann, P., Koslicki, D., Janssen, S., Dröge, J., Gregor, I., et al., (2017) Critical assessment of metagenome interpretation—a benchmark of metagenomics software. Nature methods, 14: 1063-107, doi:10.1038/nmeth.4458
[2] Heyer, R., Schallert, K., Zoun, R., Becher, B., Saake, G., Benndorf, D., (2017). Challenges and perspectives of metaproteomic data analysis. Journal of Biotechnology, 261:24-36. doi: 10.1016/j.jbiotec.2017.06.1201
[3] Heyer, R., Schallert, K., (shared first), Büdel, A., Zoun, R., Dorl, S., Kohrs, F., Püttker, S., Siewert, C., Muth, T., Saake, G., Reichl, U., Benndorf, D., (2019) MPA-WORKFLOW: A robust and universal metaproteomics workflow for research studies and routine diagnostics within 24°h using phenol extraction, FASP digest, and the MetaProteomeAnalyzer. Frontiers in Microbiology, 10: 1883. doi: 10.3389/fmicb.2019.01883
[4] Walke, D., Schallert, K., Ramesh, P., Benndorf, D., Lange, E., Reichl, U., Heyer, R., (2021) MPA_Pathway_Tool: User-friendly, automatic assignment of microbial community data on metabolic pathways, Int. J. Mol. Sci. 2021, 22(20), 10992; doi.org/10.3390/ijms222010992
WS 5: Boost your Data Management Planning (half day 1 pm - 4 pm)
Organizers: Helena Schnitzer, Daniel Wibberg (ELIXIR Germany, Forschungszentrum Jülich)
Participants: max 30
Have you ever wondered what research data management really is? Why is it so important? Then ELIXIR-DE has created the perfect training for you!
More and more funders require the establishment of research data management plans to distribute their grants. Over the course of just three hours, you get the chance to learn about how to transfer research project proposals into proper Data Management Plans (DMPs). Data management experts will guide you through the basics of research data management, the dos and don’ts and how to improve the management of the data produced in research projects.
After a short introduction of the research data life cycle and the FAIR data principles, we will explore in multiple hands-on sessions what a data management plan (DMP) is. We will sink our teeth into components, language, software and examples of DMPs. In light of the FAIR principles, we will evaluate possible problems and solutions and self-assess a drafted DMP for our own projects.
Learning goals
Provisional schedule