We introduce an online action recognition system that can be combined with any set
of frame-by-frame feature descriptors. Our system covers the frame feature space with
classifiers whose distribution adapts to the hardness of locally approximating the Bayes
optimal classifier. An efficient nearest neighbour search is used to find and combine the
local classifiers that are closest to the frames of a new video to be classified. The advantages
of our approach are: incremental training, frame by frame real-time prediction,
nonparametric predictive modelling, video segmentation for continuous action recognition,
no need to trim videos to equal lengths and only one tuning parameter (which,
for large datasets, can be safely set to the diameter of the feature space). Experiments
on standard benchmarks show that our system is competitive with state-of-the-art non-incremental
and incremental baselines.
We propose a global descriptor that is accurate, compact and easy to compute as compared to the state-of-the-art for characterizing
depth sequences. Activity enactment video is divided into temporally overlapping blocks. Each block (set of image frames) is used to generate
Motion History Templates (MHTs) and Binary Shape Templates (BSTs) over three different views - front, side and top. The three views
are obtained by projecting each video frame onto three mutually orthogonal Cartesian planes. MHTs are assembled by stacking the difference
of consecutive frame projections in a weighted manner separately for each view. Histograms of oriented gradients are computed and concatenated
to represent the motion content. Shape information is obtained through a similar gradient analysis over BSTs.
The main aim of this project is to produce a system which utilises depth information from a Kinect 2.0 sensor, in order to recognise human gestures in real time,
for the purpose of human-robot interaction.
The method that I used to achieve this was to firstly normalize the skeletal data
provided b the Kinect by translating and scaling it, I then created a 20 dimensional
feature vector of 3D distances between selected skeletal body joints, and 2D angles
between selected bones in the body (angles between a pair of vectors).
I then created a forward HMM, with 7 hidden states and custom transition and
probability matrices, to represent each gesture class. The HMM’s were then trained
on a set of training gesture sequences from the newly available NTU RGB+D
dataset, by applying the Baum-Welch algorithm to the gesture sequences, in order to
learn the transitional probabilities of each HMM.