T1 – Automated Machine Learning for Bioinformatics and Computational Biology
Date: Sunday September 9, 2018
Time: 9:00 – 12:30 (half day tutorial)
Ioannis Tsamardinos, University of Crete, Gnosis Data Analysis
Kleio Maria Verrou, University of Crete
Vincenzo Lagani, Ilia State University, Gnosis Data Analysis
Supervised machine learning methods for predictive or diagnostic modeling is routinely applied on molecular data in bioinformatics or computational biology, for example, to diagnose disease status, to predict survival or relapse time, or to identify which molecules are associated (in a multivariate fashion) with the concentration of some other molecule. It is also applied for knowledge discovery and gaining insight into molecular mechanisms, in combination with feature selection for the identification of predictive biosignatures (sets of biomarkers that are collectively predictive). However, to apply these methods significant expertise and effort is required; it is a procedure prone to errors and methodological pitfalls, particularly when sample size is small, and the dimensionality is high as in typical settings in bioinformatics. Such errors can introduce bias in the results, lead to erroneous biological findings or even invalidating the whole analysis. Research in Machine Learning has been trying to automate the process dealing with issues such as intelligent selection and combination of algorithms, automated tuning of their hyper-parameters, non-optimistic estimation of performance, feature selection and multiple feature selection, stability of feature selection and many others, in what is called Automatic Machine Learning (AutoML). The tutorial will present an introduction to this area for a bioinformatics-oriented audience. It will cover the problems and scope of AutoML, the pitfalls and problems facing the analyst, some basic algorithms, and the tools for AutoML.
More information: http://www.mensxmachina.org
The tutorial involves basic concepts of machine learning. However, we expect the main messages of the tutorial to be understood by life scientists with no formal background on this subject.
Participants should bring their own laptops.
9:00 introduction to the problem: knowledge discovery, feature selection, and bio-signature discovery
9:30 challenges of automated predictive modelling for biomedical applications:
- creating an analytics pipeline
- selecting and tuning the appropriate algorithms
- estimating performance
10:30 Pitfalls of automated predictive modelling
11:00 Tea/Coffee break
11:30 Hands on session
12:15 Future directions, wrap up, questions
12:30 End of program