Bioinformatics for microbiomes
Module 2

Asist. Prof. Sotirios Vasileiadis, University of Thessaly, Greece

• Assist. Prof. S. Vasileiadis, University of Thessaly, Greece
• Dr M. Tsiknia, Agricultural University of Athens, Greece
• Dr A. Meziti, Smallomics LP, Greece
• Dr K. Billis, The European Bioinformatics Institute, UK
• Prof K. Konstantinidis, Georgia Institute of Technology, USA
• Prof. D.G. Karpouzas, University of Thessaly, Greece

Learning outcomes

The students are expected at the end of the module to:

• Have conceptual understanding of the main algorithms and strategies employed in the analysis of microbiomes
• Be familiar with the principles of techniques employed for generating the microbial data
• Have good understanding of community-wide metrics of alpha/beta diversity, differential abundance and correlation tests and linked methods
• Be able to perform software operations and basic statistical analysis tasks with the Mothur and R software
• Be able to understand and prepare graphs and illustrations for communicating microbiome analysis outputs

E-class: To be provided


1 Microbiome analysis in the era of big data. Strategies: questions and answers.
2 Generations of sequencing: Chemistry, errors and uses.
3 Next generation sequencing (NGS) of marker gene amplicons (MGA): from sampling, to raw data, to data quality control, to annotation and microbial data matrices. Units of phylogenetic markers: OTU, phylotype, genotype/ASV.
4 Quantitative/qualitative ecological aspects: alpha/beta diversity, core microbiome, differential microbial data features, correlations (between phylogenetic markers and environmental variables/features).
5 Random forests for assessing and predicting classification of samples and assessing importance of microorganisms/features.
6 Shotgun sequencing metagenomics for functional and phylogenetic annotation: approaches (shallow vs deep screening, feasibility), data/approach quality assessment/control, assembly (or not) strategies and algorithms, annotation algorithms and databases.
7 Sequencing based meta-transcriptomics: from sampling to sequencing library prep, reference-genome-based and reference-free analysis, data curation and differential expression algorithms.
8 Towards metagenomics systems biology: Metaproteomics and environmental metabolomics


9 Why Mothur and/or R?
Intro to Mothur: installation, running modes and commands
Intro to R, installation of tools for working with R, R objects, packages.
10 R coding bootcamp (1): R as a calculator, installing/loading packages, objects, variables, operators and operations, location, command structure, getting help.
11 R coding bootcamp (2): data types, conditional statements.
12 R coding bootcamp (3): loops, building a function, illustrations.
13 MGA analysis hands on (1): generating an OTU table with Mothur and preparing a phyloseq object.
14 MGA analysis hands on (2): generating an ASV table with dada2 and preparing a phyloseq object.
15 MGA analysis hands on (3): reducing the dataset and assessing treatment effect with random forests.
16 MGA analysis hands on (4): alpha diversity analysis with ANOVA and non-parametric equivalents; multivariate approaches (hierarchical clustering, nMDS, assessing treatment effects with CCA/RDA and PERMANOVA) for beta diversity.
17 MGA analysis hands on (5): differential abundance of taxa/features between treatments, core microbiomes, correlations among features and between features and parameters.
18 MGA analysis hands on (6): appointment of assays, initiation, and supervision.
19 Exams (1 hour) and assay presentations (2 hours)



• Assay (50%)
• Written exams (50%)

Suggested readings

Journal Articles / Book chapters
01. Kovacevic & Simpson (2020). Chapter 1 – Fundamentals of environmental metabolomics. In Environmental Metabolomics. Álvarez-Muñoz, D and Farré M. (eds) Elsevier, pp 1-33.
02. Roesch et al. (2020) pime: A package for discovery of novel differences among microbial communities. Mol. Ecol. Resour. 20:415-428
03. Heyer et al. (2019) A Robust and Universal Metaproteomics Workflow for Research Studies and Routine Diagnostics Within 24 h Using Phenol Extraction, FASP Digest, and the MetaProteomeAnalyzer. Front Microbiol 10
04. Lucaciu et al. (2019) A Bioinformatics Guide to Plant Microbiome Analysis. Front. Plant Sci. 10, 18
05. Thompson et al. (2019) Machine learning to predict microbial community functions: An analysis of dissolved organic carbon from litter decomposition. PLOS ONE 14, e0215502
06. Quince et al. (2017) Shotgun metagenomics, from sampling to analysis. Nat Biotechnol 35, 833-844
07. Shakya et al. (2019) Advances and challenges in metatranscriptomic analysis. Front. Genet 10,904
08. Callahan et al. (2016) Bioconductor workflow for microbiome data analysis: from raw reads to community analyses [version 2; peer review: 3 approved]. F1000Research 5
09. Goodwin et al. (2016) Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 17, 333-351
10. Kyrpides et al. (2016). Microbiome data science: understanding our microbial planet. Trends Microbiol 24, 425-427
11. Peimbert & Alcaraz (2016) A hitchhiker’s guide to metatranscriptomics. in field guidelines for genetic experimental designs in high-throughput sequencing. Aransay, AM and Lavín Trueba JL. eds. (Cham: Springer International Publishing), pp 313-342.
12. Schloss & Westcott (2011) Assessing and improving methods used in Operational Taxonomic Unit-based approaches for 16S rRNA gene sequence analysis. Appl. Environ. Microbiol. 77:3219-3226
13. Jost L (2007). Partitioning diversity into independent alpha and beta components. Ecology 88, 2427-2439
14. Jost L (2006). Entropy and diversity. Oikos 113, 363-375
15. Whittaker RH (1960). Vegetation of the Siskiyou mountains, oregon and california. Ecol Monogr 30, 279-338
01. Beiko et al. (2018). Microbiome Analysis: methods and protocols (New York, NY: Humana Press).
02. Crawley MJ (2013). The R book. Second edition. (Chichester, West Sussex, United Kingdom: Wiley).
03. Borcard et al. (2011). Numerical ecology with R (New York, USA: Springer).
01. Mothur:
02. Mothur MiSeqSOP:
03. Dada2 Pipeline Tut. 1.18:
04. GUSTAME Ecol. stat. methods: