Research Theme: Human Robot Interaction
Action recognition is a fast-growing area of research in computer vision. The problem consists in, given a video captured by one or more cameras, detecting and recognising the category of the action performed by the person(s) who appear in the video.
The problem is very challenging, for a number of reasons: labelling videos is an ambiguous task, as the same sequence can be assigned different verbal descriptions by different human observers; different motions can carry the same meaning (inherent variability); nuisance factors such as viewpoint, illumination variations, occlusion (as parts of the moving person can be hidden behind objects or other people) further complicate recognition.
In addition, traditional action recognition benchmarks are based on a 'batch' philosophy: it is assumed that a single action is present within each video, and videos are processed as a whole, typically via algorithms which require entire days to be completed. This can be ok for tasks such as video browsing and retrieval over the internet (although speed is a huge issue there), but is completely unacceptable for a number of real world applications which require a prompt, real-time interpretation of what is going on.

Examples are: human-robot and human-machine interaction (using gestures to send commands to a computer or a robot), surveillance (detecting potentially dangerous actions or events in live feeds), car driver's monitoring (monitoring the level of attention, or responding to gestural commands), gaming (interpreting the body language of a video game player), intelligent vehicles (understanding the behaviour of pedestrians and other vehicles in the vicinity of a car).

Consequently, a new paradigm of 'online', 'real-time' action recognition is rapidly emerging, and is likely to shape the field in coming years. The AI and Vision group is already building on its multi-year experience in batch action recognition to expand towards online recognition, based on two distinct approaches: one based on the application of novel 'deep learning' neural networks to automatically segmented video regions, the other resting on continually updating an approximation of the space of 'feature' measurements extracted from images, via a set of balls of radius which depends on how difficult classification is within that region of the space.
Funded by: Oxford Brookes University 150th Anniversary Scholarship

Lab Member(s): Gurkirt Singh