Introduction to Machine Learning - streamed
When: 22-23 July 2020, 09:00 - 16:30 CEST
Where: Online. Streamed from Greece and Mauritius. More information here
Instructors:
Helpers:
With the rise in high-throughput sequencing technologies, the volume of omics data has grown exponentially in recent times and a major issue is to mine useful knowledge from these data which are also heterogeneous in nature. Machine learning (ML) is a discipline in which computers perform automated learning without being programmed explicitly and assist humans to make sense of large and complex data sets. The analysis of complex high-volume data is not trivial and classical tools cannot be used to explore their full potential. Machine learning can thus be very useful in mining large omics datasets to uncover new insights that can advance the field of bioinformatics.
This 2-day course will introduce participants to the machine learning taxonomy and the applications of common machine learning algorithms to omics data. The course will cover the common methods being used to analyse different omics data sets by providing a practical context through the use of basic but widely used R libraries. The course will comprise a number of hands-on exercises and challenges where the participants will acquire a first understanding of the standard ML processes, as well as the practical skills in applying them on familiar problems and publicly available real-world data sets.
At the end of the course, the participants will be able to:
This course is intended for master and PhD students, post-docs and staff scientists familiar with different omics data technologies who are interested in applying machine learning to analyse these data. No prior knowledge of Machine Learning concepts and methods is expected nor required.
Familiarity with any programming language will be required (familiarity with R will be preferable).
This course will be streamed, you are thus required to have your own computer with an internet connection. In order to ensure clear communication between Instructors and participants, we will be using collaborative tools, such as Google Drive and/or Google Docs.
Maximum participants: 30
Day 1
Time | Details |
---|---|
09:00 - 09:30 | Course Introduction. - Welcome. - Introduction and CoC. - Way to interact - Practicalities (agenda, breaks, etc). - Setup Link to material |
09:30 - 10:00 | Introduction to Machine Learning (theory) |
10:00 - 11:30 | What is Exploratory Data Analysis (EDA) and why is it useful? (hands-on) - Loading omics data - PCA Link to material |
11:30 - 11:45 | Coffee Break |
11:45 - 12:15 | Introduction to Unsupervised Learning (theory) |
12:15 - 13:00 | Agglomerative Clustering: k-means (practical) Link to material |
13:00 - 14:00 | Lunch break |
14:00 - 14:45 | Agglomerative Clustering: k-means (practical) (cont’d) |
14:45 - 15:30 | Divisive Clustering: hierarchical clustering (practical) Link to material |
15:30 - 15:45 | Coffee Break |
15:45 - 16:30 | Divisive Clustering: hierarchical clustering (practical) (cont’d) |
16:30 | Closing of Day 1 |
Day 2
Time | Details |
---|---|
09:00 - 09:30 | Welcome Day 2. - Questions from Day 1 - Agenda |
09:30 - 10:00 | Introduction to Supervised Learning (theory) - Overview of multiple algorithms - Advantages and Disadvantages |
10:00 - 10:30 | Classification Metrics (theory) - F1 Score, Precision, Recall - Confusion Matrix, ROC-AUC |
10:30 - 11:30 | Classification (practical) - Decision trees - Random Forests Link to material |
11:30 - 11:45 | Coffee Break |
11:45 - 12:30 | Classification (practical) (cont’d) |
12:30 - 13:30 | Lunch break |
13:30 - 14:00 | Regression (theory) |
14:00 - 15:15 | Regression (practical) - Linear regression - Generalized Linear Model (GLM) Link to material |
15:15 - 15:30 | Coffee Break |
15:30 - 16:00 | Regression (practical) (cont’d) |
16:00 - 16:30 | Closing questions, Discussion |
The exam offers you the possibility of applying what you learned during this course on a health-related data-set.
If you finish all the exercises and wish to practice on more examples, here are a couple of good examples to help you get more familiar with the different ML techniques and packages.
The material in the workshop has been based on the following resources:
Relevant literature includes:
Coordination: Monique Zahn
We will recommend 0.50 ECTS credits for this course (given a passed exam at the end of the course). The exam description is available here - and is due for August 7th.
You are welcome to register to the SIB courses mailing list to be informed of all future courses and workshops, as well as all important deadlines using the form here.
SIB abides by the ELIXIR Code of Conduct. Participants of SIB courses are also required to abide by the same code.
For more information, please contact training@sib.swiss.
This material is made available under the Creative Commons Attribution 4.0 International license. Please see LICENSE for more details.
Shakuntala Baichoo, Wandrille Duchemin, Geert van Geest, Thuong Van Du Tran, Fotis E. Psomopoulos, & Monique Zahn. (2020, July 23). Introduction to Machine Learning (Version v1.0.0). Zenodo. http://doi.org/10.5281/zenodo.3958880