Research Theme: Machine Learning

Live Projects
Federated Machine Learning
Continual Learning
Self-supervised Learning
Machine Learning 'in the wild'

In federated learning the training data remains distributed on the mobile devices and learning of the model takes place by aggregating locally computed updates. In this way, different hospital sites may enjoy superior diagnostic performance than that obtained by just training machine learning models on their proprietary data. The parameters (but not the data) of the individual models trained by each partner on their own data are shared, to obtain a more robust "global" model which is then shared among the partners.

While interesting work has been recently directed at continual learning from streaming data in a fully supervised setting, especially focusing on avoiding catastrophic forgetting, continual learning in a semi-supervised setting remains a wide-open research question. In our approach, the problem can be reconduced to a continual supervised learning setting under a 'multiple worlds' assumption, in which we first seek, in an incremental fashion, the most likely labelling(s) of the current datastream.

Self-supervised learning empowers us to exploit the variety of labels that usual come with the data for free. Technically, the idea is to define an auxiliary task for which we already have labels, 'hidden' within the structure of the data itself. This happens by defining a self-supervised task, also known as pretext task, which guides us to a supervised loss function. However, no theoretical foundations for self-supervised learning yet exist - the purpose of our work is to provide a theoretical justification for self-supervised learning through a combination of functional analysis and optimisation theory.

Our goal is a blue sky rethinking of machine learning, laying the foundations for an entirely new, inherently robust theory of learning. Statistical learning theory is generalised to allow for test and training data to come from distinct probability distributions. We move away from the selection of single models to that of convex sets of models, and employ the resulting theory to lay solid theoretical foundations for deep learning.

Past Projects
Metric learning
Tensor classification
Vehicle classification from inductive loop signature

We devised a general framework for learning distance functions for generative dynamical models, given a training set of labelled videos. The optimal distance function is selected among a family of pullback ones, induced by a parameterised automorphism of the space of models. We focus here on hidden Markov models and their manifold, and design an appropriate automorphism there. Experimental results are presented which show how pullback learning greatly improves action recognition performances with respect to base distances.

In most real-world problem however, observations are influenced by a number of nuisance factors. To tackle their influence, it is natural to resort to multi-linear or "tensorial" decompositions. We show how HOSVD can be exploited to formulate a natural generalization of Tenenbaum's bilinear classifiers, which we call 'multilinear classifiers', able to classify observations depending on one content label and several style labels.

Inductive loops are sensors that are widely deployed on road networks for the purpose of traffic data collection. Our aim is to classify vehicles in a 10 category scheme such as the SWISS10 from inductive loop signals. We looked at two machine-learning algorithms: Support Vector Machines and Adaptive Boosting with decision stumps. We used the two most common algorithms for multiclass classification, One-versus-One and One-versus-Rest, and we looked at addressing class-imbalance with Under-sampling, Oversampling and SMOTE.