Introduction to Machine Learning: Opportunities for advancing omics data analysis

Material

Machine learning has emerged as a discipline that enables computers to assist humans in making sense of large and complex data sets. With the drop-in cost of sequencing technologies, large amounts of omics data are being generated and made accessible to researchers. Analysing these complex high-volume data is not trivial and the use of classical tools cannot explore their full potential. Machine learning can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of medicine and improve health care.

The aim of this tutorial is to introduce participants to the Machine learning (ML) taxonomy and common machine learning algorithms. The tutorial will cover the methods being used to analyse different omics data sets by providing a practical context through the use of basic but widely used R and Python libraries. The tutorial will comprise a number of hands on exercises and challenges, where the participants will acquire a first understanding of the standard ML processes as well as the practical skills in applying them on familiar problems and publicly available real-world data sets.

ELIXIR / CODATA-RDA Research Data Science Advanced Workshop on Bioinformatics

Material

As part of the CODATA-RDA Research Data Science Summer School in Trieste (6-17 Aug 2018 and 5-16 Aug 2019), ELIXIR Training has contributed in the organization of the flanking Advanced Bioinformatics workshop (20-24 Aug 2018 and 19-23 Aug 2019) with a particular focus on Machine Learning applications, as part of the ELIXIR Implementation Study on Learning Paths.

This advanced bioinformatics course provided an overview of the current status of different NGS workflows (variant calling, RNA-Seq, ChIP-Seq, Metagenomics etc), and combined them with the appropriate Machine Learning and Data Mining approaches. After providing a strong foundation of the underlying theory and concepts, the course relied heavily on hand-on exercises and tutorials in order for the participants to directly apply and practice on methods and techniques presented throughout each day.

The course was led by me (INAB/CERTH, ELIXIR Greece), with Amel Ghouila (Institut Pasteur de Tunis / H3ABionet), Gabriele Schweikert (Cyber Valley Initiative, University of Tübingen, DE / Computational Biology, University of Dundee, UK) and Phelelani Mpangase (Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa) as co-instructors of the course. Moreover, the daily training activities were supported by three helpers; Maria Tsagiopoulou (INAB / CERTH), Ola Karrar (University of Khartoum) and a past learner to the CODATA-RDA Research Data Science Summer School, and David Helekal (University of Dundee).

Workshop on Reproducible analysis and Research Transparency

Material

This workshop was part of the Open Science Tools, Data & Technologies for Efficient Ecological & Evolutionary Research Symposium, organized by NIOO-KNAW and DANS-KNAW on 7 & 8 December 2017 at the Amsterdam Science Park.

This workshop provided an overview of the to date status in reproducible analysis in order to provide transparency in research. The workshop covered methodological topics (such as the use of the Open Science Framework and reporting guidelines) as well as software tools (such as Git, Docker, RMarkdown / knitr and Jupyter). Going beyond simple listing and presentations, the workshop focused on hands-on skill building, with exercises and tutorials covering most of the software aspects.