the problem methodology results datasets code references
Human motion analysis is a crucial topic in computer vision with applications in surveillance, human machine interface and animation, amongst others, and has received considerable attention from the community over the last decades [Moeslund]. Model based approaches [Hogg,Davis] assume a known, often kinematic, model for the human body and recover parameters of this model in a joint space using image information: Such a search is though difficult without adequate initialization. Learning based approaches [Brand,Grauman,Elgammal] which directly relate visual information to learned body configurations are not affected by the initialization issue, but are limited by the use of training sets of examples. In contrast, techniques have been proposed that directly infer body poses from multiple image cues or volume sequences: skeletonization methods for instance recover the intrinsic articulated structure of 3D shapes, either directly in 3D [Brostow], or in an embedded space [Jenkins,Sundaresan]. Spectral embeddings [Belkin,Tenenbaum] have indeed the ability to map 3D shapes onto low-dimensional manifolds, thus naturally revealing the intrinsic structure of an articulated shape. A critical issue is the presence of topological ambiguities raised by self contacts, as noticed by Sundaresan and Chellappa [Sundaresan] who used an a priori graphical model to resolve them.
We propose instead a spectral approach that segments body parts in 3D body shape sequences without any a-priori information nor learned examples, while seeking robustness to topological ambiguities over time. Recent attempts to extend nonlinear reduction to spatio-temporal data [Jenkins,Lin] rely on enforcing temporal relationships when embedding time sequences, a hard task when handling dense volume representations. We propose instead a mechanism to enforce temporal consistency of segments obtained by collinear clustering in the embedded space, where clusters are remarkably stable under articulated motions, and propagate them over time. We favor Local Linear Embedding [Roweis] (LLE) as it exhibits a number of desirable features in the specific scenario of unsupervised segmentation: It conserves shape protrusions in virtue of its local isometry, while their separation is increased and their intrinsic dimensionality reduced as an effect of the covariance constraint.
Results on synthetic images with ground truth Handling topology changes Handling missing data VS EM clustering VS ISOMAP
Results on real images
In the case of real voxelset sequences ground truth is difficult to gather. However, we can visually appreciate the quality of the segmentation for complex sequences.