Grants and Funding

*NEW* ECM funded Research Fellow

November 2018 - October 2020
Budget: £100,000
Own role: Principal Investigator

The Visual AI Laboratory, in partnership with Dr Matthias Rolf of the Cognitive Robotics group and the Autonomous Driving group led by Dr Andrew Bradley, has secured funding for £100,000 from the the School of Engineering, Computing and Mathematics to support a Research Fellow in Artificial Intelligence for Autonomous Driving, for a period of two years.

The project concerns the design and development of novel ways for robots and autonomous machines to interact with humans in a variety of emerging scenarios, including: human-robot interaction, autonomous driving, personal (virtual or robotic) assistants. In particular, we believe novel, disruptive applications of AI require much more sophisticated forms of communication between humans and machines, something that goes far beyond conventional explicit and linguistic exchange of information towards implicit non-verbal communication and understanding of each other's behaviour.
For example, smart cars need to understand that children and construction workers have different reasoning processes that lead to very different observable behaviour, in order to blend in with the road as a human-centered environment. Empathic machines have the potential to revolutionise healthcare, by providing better care catering for the psychological needs of patients. Morally and socially appropriate behaviour is key in all such scenarios, to build trust and lead to acceptance from the public.
Exciting research is currently going on in moral robotics and AI, including moral development (how a robot can learn moral principles), fairness and bias in, for instance, AI-assisted recruitment. As smart cars head towards real world deployment, the field is shifting from mere perception (e.g. SLAM) to higher-level cognition tasks, starting from the automated detection of road events. Holographic AI is going to revolutionise the field of personal assistants, but needs effective communication interfaces.

A Research Fellow Grade 8 position (starting salary: £30,688) will be advertised as soon as September 2018.

Material and resources:

*NEW* Knowledge Transfer Partnership with Createc and Sportslate

September 2018 - August 2020
Budget: £190,000
Own role: Academic Supervisor and Lead Academic

A Knowledge Transfer Partnership (KTP) with Createc and Sportslate, two successful spinoffs of Oxford University, was funded at the lates round by Innovate UK.

The project is split into two key phases each taking approximately 12 months, aiming to demonstrate a simple proof of concept at the mid-point with the second year focused on maturation, refinement and steps to commercialisation. The first phase will consist of the Associate reviewing the state of the art and conducting a literature review, understanding the hardware and system architecture and capturing further datasets for algorithmic training, in addition to the following technical work packages:
  1. Sensor fusion: The company's system provides not only video imagery from multiple viewpoints but also data providing depth, dynamic data and point cloud overlays over the imagery. This enables a novel approach to action identification where this extra information can be integrated with the video to enhance performance
  2. Person segmentation: The first task ahead of person or action identification is to segment the person from the background which due to the tracking system is highly dynamic. This is a key enabling task but there are multiple existing techniques for performing this task
  3. Person identification: It is important for all applications to associate an action with an individual. In the crowd monitoring case, single actions may be inconsequential but an individual carrying out multiple actions may be of more interest
  4. Single person action identification: This task will develop algorithms for identifying single person actions from the video data
These will be integrated for a proof of concept demonstration in month 13. The second phase of the work will integrate the algorithms with real customer datasets and other datasets held by Createc, enabling testing of the algorithms under a wide range of conditions. Inevitably this will lead to algorithm refinement. This work is important to demonstrate that the approaches can be used commercially with real data, therefore de-risking commercial exploitation beyond this project. Technically this phase will also include extension of the single person action identification to multi-people events, and for the system to understand these links.
Towards the end of the project, the algorithms and capabilities will be marketed to prospective customers, and the Associate will work on development of marketing material, videos and academic papers/presentations to raise the profile of the work.

A KTP Associate position will be advertised as soon as September 2018. Salary will be in the range 30,000 - 35,000 per annum.

*NEW* SARAS - Smart Autonomous Robotic Assistant Surgeon

January 2018 - December 2020
Coordinator: Dr Riccardo Muradore, University of Verona, Italy
Budget: €4,315,640 (Oxford Brookes' share: €596,073)
Own role: Scientific Officer (SO) for the whole project, as well as WP Leader

In surgical operations many people crowd the area around the operating table. The introduction of robotics in surgery has not decreased this number. During a laparoscopic intervention with the da Vinci robot, for example, the presence of an assistant surgeon, two nurses and an anaesthetist, is required, together with that of the main surgeon teleoperating the robot. The assistant surgeon needs always be present to take care of simple surgical tasks the main surgeon cannot perform with the robotic tools s/he is teleoperating (e.g. suction and aspiration during dissection, moving or holding organs in place to make room for cutting or suturing, using the standard laparoscopic tools). Another expert surgeon is thus required to play the role of the assistant, to properly support the main surgeon using traditional laparoscopic tools as shown in Figure 1.

The goal of SARAS is to develop a next-generation surgical robotic platform that allows a single surgeon (i.e., without the need for an expert assistant surgeon) to execute robotic minimally invasive surgery (R-MIS), thereby increasing the social and economic efficiency of a hospital while guaranteeing the same level of safety for patients. This platform is called solo-surgeon system.

Material and resources:

Knowledge Transfer Partnership with Meta Vision LTD

September 2015 - August 2017
Budget: £160,000
Own role: Academic supervisor

In the welding industry, we see an increasing need for automated inspection, both in partnership with automated seam tracking and as a completely separate function.
The aim of this project is to develop algorithms for computer vision capable of analysing 3D scans of robotic welds, by extracting underlying geometry, identifying a range of standard defects, and classifying the welds as acceptable or not according to geometrical definitions.

Three key stages of the project can be identified:
  • Performing automatic analysis of 3D data requires an in depth understanding and application of the underlying mathematics involved. It will be necessary to use this knowledge to define the basis for the operation of the algorithms.

  • The second step will be to use the mathematical development in the form of a set of algorithms for matching the 3D datasets of actual parts to be inspected to either theoretical models of good and bad welds or stored, processed 3D models of good and bad welds, and thereby making a determination of the overall quality of the weld in question and identifying any particular defects.

  • To support the first two items above, it may be necessary to have a database which extracts key geometric information about good and bad shapes and makes that available to the inspection algorithms themselves. The database will also store basic 3D representations of complete parts.
News and project website:

Online action recognition for human-robot interaction

Oxford Brookes University: 150th Anniversary Scholarship

September 2015 - March 2019
Budget: 1 PhD studentship
Own role: Director of studies
Personnel: Gurkirt Singh

Action recognition is a fast-growing area of research in computer vision. The problem consists in, given a video captured by one or more cameras, detecting and recognising the category of the action performed by the person(s) who appear in the video. The problem is very challenging, for a number of reasons: labelling videos is an ambiguous task, as the same sequence can be assigned different verbal descriptions by different human observers; different motions can carry the same meaning (inherent variability); nuisance factors such as viewpoint, illumination variations, occlusion (as parts of the moving person can be hidden behind objects or other people) further complicate recognition. In addition, traditional action recognition benchmarks are based on a ‘batch’ philosophy: it is assumed that a single action is present within each video, and videos are processed as a whole, typically via algorithms which require entire days to be completed. This can be ok for tasks such as video browsing and retrieval over the internet (although speed is a huge issue there), but is completely unacceptable for a number of real world applications which require a prompt, real-time interpretation of what is going on.

Consequently, a new paradigm of ‘online’, ‘real-time’ action recognition is rapidly emerging, and is likely to shape the field in coming years. The AI and Vision group is already building on its multi-year experience in batch action recognition to expand towards online recognition, based on two distinct approaches: one based on the application of novel ‘deep learning’ neural networks to automatically segmented video regions, the other resting on continually updating an approximation of the space of ‘feature’ measurements extracted from images, via a set of balls of radius which depends on how difficult classification is within that region of the space. Investing on online action recognition is crucial to maintain and further improve Brookes’ reputation in human action classification, face the fierce international competition on the topic.

Papers and Posters:

Code and resources:

Uncertainty in Computer Vision

Faculty of Technology, Design and Environment: Next 10 Award

September 2014 - February 2018
Budget: 1 PhD studentship
Own role: Director of studies
Personnel: Suman Saha

In recent years “online action detection” has attracted a lot of attention in the Computer Vision community due to its far- reaching real-world applications such as, human-robot interaction, autonomous surveillance, computer gaming and virtual environment, automated vehicle driving, biometric gait identifications. Here, the goal is to detect multiple human actions from online videos. In offline action detection, the system has full access to the video whereas, in the online case the detection model has only access to the present and previous frames and thus, the detection task is more challenging. Current state- of-the-art detection systems demonstrate promising results for offline applications such as, video-indexing. However, the computer vision community is still striving to model a robust online action recognition system which can perform in real-time.

Papers and Posters:
Code and resources:

Recognising and Localising Human Actions

Oxford Brookes University: Doctoral School on "Intelligent Transport Systems" (ITS)

October 2011 - October 2014
Budget: 1 PhD studentship
Own role: Director of studies
Personnel: Michael Sapienza

Human action recognition in challenging video data is becoming an increasingly important research area, given the huge amounts of user generated content uploaded to the Internet each day. The detection of human actions will facilitate automatic video description and organisation, as well as online search and retrieval. Furthermore, for the Intelligent Transport Systems (ITS) autonomous vehicle to drive safely in urban environments, it must learn to recognise and quickly react to human actions.
Giving machines the capability to recognise human actions from videos poses considerable challenges. The captured videos are often of low-quality, contain unconstrained camera motion, zoom, and shake. In addition, human actions interpreted as space-time sequences trace a very flexible structure, with variations in viewpoint, pose, scale, and illumination. Apart from these nuisance factors, actions inherently possess a high degree of within-class variability: for example, a walking motion may vary in stride, pace and style, yet remain the same action. Creating action models which can cope with this variability, while being able to discriminate between a significant number of action classes, is a serious challenge. In the first 15 months of this research project entitled "Recognising and localising human actions", we have made significant steps to tackle this challenge.

Papers and Posters:
Code and resources:

Tensorial modeling of dynamical systems for gait and activity recognition

August 2011 - January 2014
Budget: £122,000
Own role: Principal Investigator (PI)
Personnel: Dr Wenjuan Gong

Case for Support

Biometrics such as face, iris, or fingerprint recognition for surveillance and security have received growing attention in the last decade. They suffer, however, from two major limitations: they cannot be used at a distance, and require user cooperation. For these reasons, originally driven by an initiative of US’s DARPA, identity recognition from gait has been proposed as a novel behavioral biometrics, based on people’s distinctive gait pattern.
Despite its attractive features, though, gait identification is still far from being ready to be deployed in practice, as in real-world scenarios recognition is made extremely difficult by the presence of nuisance factors such as viewpoint, illumination, clothing, etcetera. Similar issues are shared by other applications such as action and activity recognition.
This proposal concerns the problem of classifying video sequences by attributing to each sequence a label, such as the type of event recorded or the identity of the person performing a certain action. It proposes a novel framework for motion recognition capable of dealing in a principled way with the issue of nuisance factors in both gait and activity recognition. The goal is pushing towards a more widespread diffusion of gait identification, as a concrete contribution to enhancing the security levels in the country in the current, uncertain scenarios. However, as the techniques devised in this proposal are extendable to action and identity recognition, their commercial exploitation potential in, for instance, video indexing or interactive video games is also enormous.

Code and resources: