T1 – Automated Machine Learning for Bioinformatics and Computational Biology




Date: Sunday September 9, 2018

Time: 9:00 – 12:30 (half day tutorial)

Venue: Stavros Niarchos Foundation Cultural Center

Room: A2.02.058 (Projection Room)



Ioannis Tsamardinos, University of Crete, Gnosis Data Analysis

Kleio Maria Verrou, University of Crete

Vincenzo Lagani, Ilia State University, Gnosis Data Analysis



Supervised machine learning methods for predictive or diagnostic modeling is routinely applied on molecular data in bioinformatics or computational biology, for example, to diagnose disease status, to predict survival or relapse time, or to identify which molecules are associated (in a multivariate fashion) with the concentration of some other molecule. It is also applied for knowledge discovery and gaining insight into molecular mechanisms, in combination with feature selection for the identification of predictive biosignatures (sets of biomarkers that are collectively predictive). However, to apply these methods significant expertise and effort is required; it is a procedure prone to errors and methodological pitfalls, particularly when sample size is small, and the dimensionality is high as in typical settings in bioinformatics. Such errors can introduce bias in the results, lead to erroneous biological findings or even invalidating the whole analysis. Research in Machine Learning has been trying to automate the process dealing with issues such as intelligent selection and combination of algorithms, automated tuning of their hyper-parameters, non-optimistic estimation of performance, feature selection and multiple feature selection, stability of feature selection and many others, in what is called Automatic Machine Learning (AutoML). The tutorial will present an introduction to this area for a bioinformatics-oriented audience. It will cover the problems and scope of AutoML, the pitfalls and problems facing the analyst, some basic algorithms, and the tools for AutoML.

More information: http://www.mensxmachina.org



The tutorial involves basic concepts of machine learning. However, we expect the main messages of the tutorial to be understood by life scientists with no formal background on this subject.



Participants should bring their own laptops.



Time      Subject

9:00        introduction to the problem: knowledge discovery, feature selection, and bio-signature discovery

9:30        challenges of automated predictive modelling for biomedical applications:

  • creating an analytics pipeline
  • selecting and tuning the appropriate algorithms
  • estimating performance

10:30     Pitfalls of automated predictive modelling

11:00     Tea/Coffee break

11:30     Hands on session

12:15     Future directions, wrap up, questions

12:30     End of program