Research Project: Action Recognition from Depth
Online gesture recognition via nonparametric incremental learning
Motion and shape history templates for gesture recognition from depth cameras
Gesture Recognition from Depth Data for Human-Robot interaction

We introduce an online action recognition system that can be combined with any set of frame-by-frame feature descriptors. Our system covers the frame feature space with classifiers whose distribution adapts to the hardness of locally approximating the Bayes optimal classifier. An efficient nearest neighbour search is used to find and combine the local classifiers that are closest to the frames of a new video to be classified. The advantages of our approach are: incremental training, frame by frame real-time prediction, nonparametric predictive modelling, video segmentation for continuous action recognition, no need to trim videos to equal lengths and only one tuning parameter (which, for large datasets, can be safely set to the diameter of the feature space). Experiments on standard benchmarks show that our system is competitive with state-of-the-art non-incremental and incremental baselines.

We propose a global descriptor that is accurate, compact and easy to compute as compared to the state-of-the-art for characterizing depth sequences. Activity enactment video is divided into temporally overlapping blocks. Each block (set of image frames) is used to generate Motion History Templates (MHTs) and Binary Shape Templates (BSTs) over three different views - front, side and top. The three views are obtained by projecting each video frame onto three mutually orthogonal Cartesian planes. MHTs are assembled by stacking the difference of consecutive frame projections in a weighted manner separately for each view. Histograms of oriented gradients are computed and concatenated to represent the motion content. Shape information is obtained through a similar gradient analysis over BSTs.

The main aim of this project is to produce a system which utilises depth information from a Kinect 2.0 sensor, in order to recognise human gestures in real time, for the purpose of human-robot interaction. The method that I used to achieve this was to firstly normalize the skeletal data provided b the Kinect by translating and scaling it, I then created a 20 dimensional feature vector of 3D distances between selected skeletal body joints, and 2D angles between selected bones in the body (angles between a pair of vectors). I then created a forward HMM, with 7 hidden states and custom transition and probability matrices, to represent each gesture class. The HMM’s were then trained on a set of training gesture sequences from the newly available NTU RGB+D dataset, by applying the Baum-Welch algorithm to the gesture sequences, in order to learn the transitional probabilities of each HMM.

Relevant papers:
  •  Rocco de Rosa, Ilaria Gori, Fabio Cuzzolin and Nicolo' Cesa Bianchi
    Active Incremental Recognition of Human Activities in a Streaming Context
    Pattern Recognition Letters, Volume 99, pages 48-56, November 2017
  •  Alessandro Antonucci, Rocco de Rosa, Alessandro Giusti and Fabio Cuzzolin
    Robust classification of multivariate time series by imprecise hidden Markov models
    International Journal of Approximate Reasoning, Volume 56, Part B, pages 249-263, January 2015
  • Rocco de Rosa, Ilaria Gori, Nicolo’ Cesa Bianchi and Fabio Cuzzolin
    Online Action Recognition via Nonparametric Incremental Learning
    BMVC 2014, September 2014
  • Saumya Jetley and Fabio Cuzzolin
    3D activity recognition using gradient analysis consolidated over motion history and binary shape templates
    ACCV 2014 - Workshop on Human Gait and Action Analysis in the Wild: Challenges and Applications, Singapore, November 1 2014
  • Ben Guy
    Gesture Recognition from Depth Data, for Human-Robot interaction
    Supervisor: Prof Fabio Cuzzolin
    MSc Dissertation, Oxford Brookes University, September 2016

 Lab Member(s): Rocco de Rosa, Saumya Jetley, Ben Guy