Research Theme: Computer Vision
Live Projects
Deep learning for action detection
Deep-predicting future actions
Deep video captioning
Deep modelling of complex activities

We developed a deep learning framework able to localise multiple action instances in real time in the form of 'action tubes', which has topped all other competitors on accuracy, while demonstrating better then real time capabilities (up to 52 fps).
For a nice summery of current action detection research, cfr. https://github.com/jinwchoi/awesome-action-recognition

Emerging applications of artificial intelligence are bringing about important paradigm shifts in machine learning and computer vision. Machines need a comprehensive awareness of what takes place in complex environments, and to be able to use this understanding to make predictions about other machines' and humans' future behaviour.

We are working in collaboration with AI Labs, Bologna, and Prof Thomas Lukasiewicz, Oxford University, on a neuro-symbolic approach to video captioning able to combine our leading action detection technology and the latest advances in symbolic and ontological reasoning to deliver realistic natural language descriptions.

This is a wide-reaching effort, separately involving University Federico II and Huawei Technologies, which aims to extend deep learning approaches to complex activities formed by a number of coordinated 'atomic' actions. We seek in particular a novel deep learning formulation of part-based models, tailored to spatio-temporal videos.

Sports footage analysis
Deep learning for manufacturing
Unsupervised action localisation

We are exploring collaborations with innovative startups in the Oxford-London area to develop deep-learning based systems capable of automatically annotating sports footage, in terms of single player actions and overall team manouvres, in both batch and real-time settings.

In collaboration with major companies such as Ocado and BMW Group, we are exploring the use of deep learning for improving indutrial processes, such as production lines and logistics departments, via either direct collaboration or Horizon 2020 funding.

When training annotion on the location of actions is not available, weakly supervised learning can be employed, in combination with unsupervised video segmentation, to locate actions of interest within an input video.

Past Projects
Part-based video deformable models
Identity recognition from gait
Laplacian methods for 3D human motion analysis
Automated visual weld inspection

We proposed an action classfication framework in which actions are modelled by discriminative subvolumes, learned using weakly supervised training. The learned action models are used to simultaneously classify video clips and to localise actions by aggregating the subvolume scores to form a dense space-time saliency map.

In this EPSRC-funded project we studied the design of a novel class of multilinear/tensorial classifiers able to linearly model the influence of several covariance factors, in a robust approach to identity recognition from gait.

While at INRIA Rhone-Alpes, Prof Cuzzolin studied the use of Laplacian embeddings in the analysis of human motion, both in terms of automated bodypart segmentation and tracking, and of robust matching of deformable bodies.

In this Knowledge Transfer Partnership, the lab collaborated with Meta Vision LTD on the design and implementation of an inspection framework for the detection and localisation of weld defects from reconstructed 3D surfaces.

Example-based pose estimation
Gesture recognition from depth cameras

In example-based pose estimation, the configuration or "pose" of an evolving object is sought given visual evidence, having to rely uniquely on a set of examples. A sensible approach consists in learning maps from features to poses, using the information provided by the training set, and fuse features expressed as belief functions. We call this approach Belief Modelling Regression.

Thanks to visiting students De Rosa and Jetley, we also worked on action and gesture recognition from depth cameras, in particular via a tessellation of local classifiers in the feature space which locally approximates the optimal Bayes classifier.