GAIT ID - Identity recognition from gait

 the problem   state of the art   background   methodology   datasets   references

 A behavioral biometric

Biometrics has received growing attention in the last decade, as automatic identification systems for surveillance and security have started to enjoy widespread diffusion. Biometrics such as face, iris, or fingerprint recognition, in particular, have been employed. They suffer, however, from two major limitations: they cannot be used at a distance, and require user cooperation. Such assumptions are not practical in real-world scenarios, e.g. surveillance of public areas.

Interestingly, psychological studies show that people are capable of recognizing their friends just from the way they walk, even when their gait is poorly represented by point light display [7]. Gait has several advantages over other biometrics, as it can be measured at a distance, is difficult to disguise or occlude, and can be identified even in low-resolution images. Most importantly gait recognition is non-cooperative in nature. The person to identify can move freely in the surveyed environment, and is possibly unaware of his/her identity being checked. Furthermore, gait and face biometrics can be easily integrated for human identity recognition [44, 16].

Despite its attractive features, though, gait identification is still far from being ready to be deployed in practice. What limits the adoption of gait recognition systems in real-world scenarios is the influence of a large number of so-called covariate factors which affect appearance and dynamics of the gait. These include walking surface, lightning, camera setup (viewpoint), but also footwear and clothing, carrying conditions, time of execution, walking speed (see Figure 1). The correlation between those factors can be indeed very significant as pointed out in [22], making gait difficult to measure and classify.

 A brief state of the art

Human motion analysis has been studied from all points of view by a growing number of research groups in the last three decades. Two different but complementary sub-problems have been identified since the early days of machine vision analysis of human behavior: estimating the pose and motion of the person (tracking), and classifying this very motion from a sequence of images (recognition). Concerning classification, however, most attention has always been given to the task of discriminating different actions or activities in smart room or human-machine interaction scenarios. Only more recently human identification from gait has started to receive growing attention from the vision community.

Vast literature on gait ID. Albeit being a quite recent branch of machine vision, the literature on gait identification is already too extensive to be covered here in its entirety [30, 13]. The methods employed so far can be roughly divided into model-based and model-free [41]. Gait analysis, generally speaking, involves two separate issues: the shape or appearance of the moving person, and the dynamics of the motion itself. A variety of image or gait signatures have been studied, most of them based on silhouette analysis, even thought many other approaches have been explored [29]. Here we focus more specifically on the issue of tackling the numerous nuisance of “covariate” factors such as clothing, illumination, etcetera.

Viewpoint as main covariate factor. The most important of those covariate factors is probably viewpoint variation. In a realistic setup, the person to identify steps into the surveyed area from an arbitrary direction. View-invariance [39, 42, 1, 33, 17, 18] is then a crucial issue to make identification from gait suitable for real-world applications. This problem has actually been studied in the gait ID context by several groups [14]. If a 3D articulated model of the moving person is available, tracking can be used as a pre-processing stage to drive recognition. Cunado et al. [6], for instance, have used their evidence gathering technique to analyze the leg motion in both walking and running gait. Yam et al. [42] have also worked on a similar model-based approach. Urtasun and Fua [39] have proposed an approach to gait analysis that relies on fitting 3D temporal motion models to synchronized video sequences. Bhanu and Han [1] have matched a 3D kinematic model to 2D silhouettes. View point invariance is achieved in [36] by means of a hip/leg model, including camera elevation angle as an additional parameter. Model-based 3D tracking, however, is a difficult task. Manual initialization of the model is often required, while optimization in a higher-dimensional parameter space suffers from convergence issues. Kale et al. [18] have proposed as an alternative a method for generating a synthetic side-view of the moving person using a single camera, if the person is far enough. Shakhnarovich et al. [33] have suggested a view-normalization technique in a multiple camera framework, using the volumetric intersection of the visual hulls of all camera silhouettes. A 3D model is also set up in [43] using sequences acquired by multiple cameras, so that the length of key limbs and their motion trajectories can be extracted and recognized. Johnson and Bobick [17] have presented a multi-view gait recognition method using static body parameters recovered during the walking motion across multiple views. More recently, Rogez et al. [32] have used the structure of man-made environments to transform the available image(s) to frontal views, while Makihara et al. [25, 24] have proposed a view transformation model in the frequency domain, acting on features obtained by Fourier analysis of a spatiotemporal volume. An approach to multiple view fusion based on the product of sum rule has been proposed in [23]. Different features and classification methods are there compared. The discriminating power of different views has been analyzed in [15]. Several evidence combination methods have been tested on the CMU Mobo database [5].

Principled way of tackling covariates lacking. More in general, the effects of all the different covariates have not yet been thoroughly investigated, even though some effort has been recently done is this direction. Bouchrika and Nixon [2] have conducted a comparative study of their influence in gait analysis. Veres et al. [40] have proposed a remarkable predictive model of the time of execution covariate to improve recognition performance. The issue has however been approached so far on an empirical basis, i.e., by trying to measure the influence of individual covariate factors. A principled strategy for their treatment has not yet been brought forward.

 Methodology: tensor modeling of dynamical models

Tensor modeling. In fact, a mathematical formalism general enough to address the fundamental issue of covariate factors in a principled way exists under the name of multilinear/tensorial analysis. The fundamental assumption is that the various factors linearly mix to generate the measurements which we observe, in our case the walking gait we actually observe. The problem of recovering those factors given the observations/measurements is often referred to in the literature as nonnegative tensor factorization or NTF [37]. Different proposals on how to resolve the tensor factorization problem have been brought forward. The PARAFAC model for multi-way analysis [19] was first introduced for continuous electroencephalogram (EEG) classification in the context of brain-computer interfaces [28]. A different multi-layer method for 3D NTF was proposed by Cichocki et al. [4]. Porteus et al. [31] introduced a generative Bayesian probabilistic model for unsupervised tensor factorization. It consists of several interacting LDA models, one for each modality (factor), coupled with a Gibbs sampler for inference. Other approaches to NTF can be found in other recent papers [21, 34, 3].

Bilinear models, in particular [38], are the best studied among multilinear models. They can be seen as tools for separating two properties, usually called style and content of the objects to classify. Along this line, De Lathauwer et al. [20] proposed to disentangle the different factors in a multilinear mixture or tensor through a tensor extension of conventional singular value decomposition, or N-mode SVD.

The purpose of this project is to develop a novel framework based on the application of multilinear decomposition techniques to video sequences of walking gaits, represented as simple dynamical models, to tackle the presence of nuisance factors which greatly affect recognition, and prevent a widespread diffusion of gait identification as a commercial viable biometric. We will build on encouraging results obtained in the recent past [8, 9]. Roughly speaking, most tensorial models share the following features.
Given a training set of video sequences containing walking gaits performed by different people, under different conditions such as viewpoint, illumination, clothing, etcetera, they represent such training set as a multi-dimensional matrix or tensor" $D$. As matrices can be decomposed into a orthogonal column and row subspaces by means of singular-value decomposition (SVD), tensors can also be decomposed into a product of $N$ orthogonal spaces $U_i$, one for each dimension of the tensor itself: $D = Z \times U_1 \times \cdots \times U_N$, where $Z$ is called the core" tensor. This can be done, for instance, by flattening" the tensor along its $i$-th dimension to get a regular matrix, and subsequently applying standard SVD to such matrix. In our case, one of the dimension will be associated with the identity of the person performing the action, while each of the others will be related to one of the covariate factors (illumination, viewpoint, etcetera).
When a new observation is available in the form of a test sequence, it is then possible to project it onto the identity subspace, and classify it there by applying any off-the-shelf classification algorithm.

Video sequences as dynamical models. A crucial point here is that, to be described by a single tensorial model, all observations have to be in the form of vectors of the same size. Therefore, in order to apply tensor modeling to image sequences, we need to encode such sequences as vectors of the same length. An effective way to do this is to use parameter identification algorithms to represent each video sequence (or best, each sequence of feature measurements extracted from the images) as a simple dynamical model, such as a hidden Markov models (see Figure 2). HMMs have been often used in the past as a way of encoding motions in a compact way and cope with issues such as time warping, especially in the context of action/activity recognition. They represent actions such as the walking gait as a finite state dynamical model, whose parameters can be collected in a single observation vector of a fixed size for each sequence.

In alternative, simple linear dynamical models such as autoregressive or ARMA models can be employed to describe the dynamic of the walking gait, drawing inspiration from the impressive results obtained when representing dynamic textures in this way. Regardless the class of model used to represent each individual sequence, a tensorial model can be built from a training set of walking gaits and used to later classify new videos. The overall framework is illustrated in Figure 3.

 Background

An application close to commercial maturity. The potential for a commercial exploitation of a functioning behavioral biometric system is enormous. It is widely recognized that traditional biometrics are inadequate to deal with non cooperative scenarios which characterize the present challenges to collective security. Unlike other biometrics, gait identification can be performed at a distance, with low resolution cameras, without requiring neither collaboration nor awareness on the subject’s side. The field is mature to move from simplified experimental settings to more realistic outdoor tests. For instance, DARPA funds a group in the Columbia Automated Vision Environment (CAVE) at Columbia University working on the problem of identifying people in bad weather: http://www1.cs.columbia.edu/CAVE/HIDweather/.

Political and industrial background. In the perspective of future commercial or public exploitation of behavioral biometrics, this is just the right time for the UK to take the lead in this field. As a matter of fact, several funding bodies in foreign countries or at European level have already recognized the potential value of behavioral biometrics in future surveillance and security scenarios. The initiative on human ID was originally launched by DARPA several years ago with a 50 million dollar project of enormous impact: http://infowar.net/tia/www.darpa.mil/iao/HID.htm.

The US has even funded related research in Europe under the EUROPEAN RESEARCH OFFICE OF THE U.S. ARMY, who sponsored such research at the University of Southampton from 2000-2004. DARPA is still supporting this strategy by launching a new search engine for video surveillance footage: http://arstechnica.com/old/content/2008/10/darpa-building--search-engine-for-video-surveillance-footage.ars.

The EU has recently funded a STREP called HUMABIO whose focus is on combining “new types of biometrics with state of the art sensorial technologies in order to enhance security in a wide spectrum of applications like transportation safety and continuous authentication in safety critical environments”: http://www.humabio-eu.org/.

On its side, Royal Society and Natural Science Foundation of China (NSFC) funded an International Joint Grant Scheme to develop models for the fusion of multiple biometrics (human face and gait) in robust and efficient non-intrusive person identification from a distance: http://www.dcs.qmul.ac.uk/˜sgg/NSFC_RSL/index.html.

Similar initiatives have been funded by the Japanese National Institute of Advanced Industrial Science and Technology: http://www.aist.go.jp.

In the UK, EPSRC has recognized the growing importance of behavioral biometrics by supporting an International Centre for Advanced Research in Identification Science (ICARIS - EPSRC Reference: GR/S66671/01). The network’s ambitious aim is to promote innovative, integrated and advanced inter-disciplinary research in human identification science in order to meet the national human identification knowledge needs for the forthcoming 5-20 years. Another small project on the feasibility of footsteps as a biometric has just ended. More efforts are necessary in order to ensure the UK competitiveness in this promising discipline in the shortmedium term.

 Datasets

 Bibliography

[1] B. Bhanu and J. Han, Individual recognition by kinematicbased gait analysis, 2002, pp. III: 343–346.

[2] Nixon-M. Bouchrika, I., Exploratory factor analysis of gait recognition, Proc. of the 8th IEEE International Conference on Automatic Face and Gesture Recognition, 2008.

[3] Gallopoulos E.-Zhang P. Plemmons R.J. Boutsidis, C., PALSIR: A new approach to nonnegative tensor factorization, Proc. of the 2nd Workshop on Algorithms for Modern Massive Datasets, 2006.

[4] Zdunek R.-Plemmons R. Cichocki, A. and S. Amari, Novel multi-layer nonnegative tensor factorization with sparsity constraints, Lecture Notes in Computer Science (Springer Verlag, ed.), vol. 4432, 2007, pp. 271–280.

[5] R.T. Collins, R. Gross, and J.B. Shi, Silhouette-based human identification from body shape and gait, 2002, pp. 351–356.

[6] Nash J.M.-Nixon M.S. Carter J.N. Cunado, D., Gait extraction and description by evidence-gathering, Proc. of AVBPA99, 1999, pp. 43–48.

[7] J. Cutting and L. Kozlowski, Recognizing friends by their walk: Gait perception without familiarity cues, Bull. Psychon. Soc. 9 (1977), 353356.

[8] F. Cuzzolin, Using bilinear models for view-invariant action and identity recognition, Proc. of CVPR’06, vol. 2, 2006, pp. 1701–1708.

[9] , Multilinear modeling for robust identity recognition from gait, Behavioral Biometrics for Human Identification: Intelligent Applications (Liang Wang and Xin Geng, eds.), IGI Publishing, 2009.

[10] F. Cuzzolin, Diana Mateus, David Knossow, Edmond Boyer, and Radu Horaud, Coherent laplacian protrusion segmentation, Proc. of the IEEE Computer Society conference on Computer Vision and Pattern Recognition (CVPR’08), 2008.

[11] Fabio Cuzzolin, Augusto Sarti, and Stefano Tubaro, Action modeling with volumetric data, Proc. of the 2004 International Conference on Image Processing (ICIP’04), vol. 2, 2004, pp. 881– 884.

[12] Edmond Boyer Fabio Cuzzolin, Diana Mateus and Radu Horaud, Robust spectral 3d-bodypart segmentation in time, Human Motion - Understanding, Modeling, Capture and Animation - Lecture Notes in Computer Science (H.-P. Seidel, Y. Wang, B. Rosenhahn, and G. Mori, eds.), vol. 4814/2007, Springer Berlin / Heidelberg, 2007, pp. 196–211.

[13] D. Gafurov, A survey of biometric gait recognition: Approaches, security and challeges, Proc. of NIK-2007, 2007.

[14] J. Han, B. Bhanu, and A.K. Roy Chowdhury, A study on view-insensitive gait recognition, 2005, pp. III: 297–300.

[15] Boulgouris N.V. Huang, X., Human gait recognition based on multiview gait sequences, EURASIP Journal on Advances in Signal Processing (2008).

[16] Arabnia H.R. Jafri, R., Fusion of face and gait for automatic human recognition, Proc. of the Fifth International Conference on Information Technology, 2008.

[17] A.Y. Johnson and A.F. Bobick, A multi-view method for gait recognition using static body parameters, 2001, p. 301.

[18] A. Kale, A.K. Roy Chowdhury, and R. Chellappa, Towards a view invariant gait recognition algorithm, 2003, pp. 143– 150.

[19] H.A.L. Kiers, Towards a standardized notation and terminology in multiway analysis, Journal of Chemometrics 14 (2000), no. 3, 105–122.

[20] L. De Lathauwer, B. De Moor, and J. Vandewalle, Multilinear singular value decomposition, SIAM Journal of Matrix Analysis and Applications 21 (2000), no. 4.

[21] Kim Y.-D. Cichocki A. Choi-S. Lee, H., Nonnegative tensor factorization for continuous eeg classifcation, International Journal of Neural Systems 17 (2007), no. 4, 305–317.

[22] X.L. Li, S.J. Maybank, S.J. Yan, D.C. Tao, and D.J. Xu, Gait components and their application to gender recognition, 38 (2008), no. 2, 145–155.

[23] J.W. Lu and E. Zhang, Gait recognition for human identification based on ica and fuzzy svm through multiple views fusion, 28 (2007), no. 16, 2401–2411.

[24] Y.S. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and Y.S. Yagi, Gait recognition using a view transformation model in the frequency domain, 2006, pp. III: 151–163.

[25] , Which reference view is effective for gait identification using a view transformation model?, 2006, p. 45.

[26] Diana Mateus, Fabio Cuzzolin, Edmond Boyer, and Radu Horaud, Articulated shape matching using locally linear embedding and orthogonal alignment, 11th International Conference on Computer Vision (ICCV’07) - NTRL Workshop, 2007.

[27] Diana Mateus, Radu Horaud, David Knossow, Fabio Cuzzolin, and Edmond Boyer, Articulated shape matching using laplacian eigenfunctions and unsupervised point registration, Proc. of the IEEE Computer Society conference on Computer Vision and Pattern Recognition (CVPR’08), 2008.

[28] Hansen L.K.-Herrmann C.S. Parnas J. Morup, M. and S.M. Arnfred, Parallel factor analysis as an exploratory tool for wavelet transformed event-related eeg, NeuroImage 29 (2006), no. 3, 938–947.

[29] C. Nandini and C.N. Ravi Kumar, Comprehensive framework to gait recognition, Int. J. Biometrics 1 (2008), no. 1, 129–137.

[30] M.S. Nixon and J.N. Carter, Automatic recognition by gait, 94 (2006), no. 11, 2013–2024.

[31] Bart E.- Welling M. Porteus, I., Multi-HDP: A nonparametric Bayesian model for tensor factorization, Proc. of AAAI 2008, 2008, pp. 1487–1490.

[32] Guerrero J.J.-Martinez del Rincon J. Orrite-Uranela C. Rogez, G., Viewpoint independent human motion analysis in man-made environments, Proc. of BMVC’06, 2006.

[33] G. Shakhnarovich, L. Lee, and T.J. Darrell, Integrated face and gait recognition from multiple views, 2001, pp. I:439– 446.

[34] Hazan-T. Shashua, A., Non-negative tensor factorization with applications to statistics and computer vision, Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 792–799.

[35] Andrea Sorrentino, Fabio Cuzzolin, and Ruggero Frezza, Using hidden Markov models and dynamic size functions for gesture recognition, Proc. of the 8th British Machine Vision Conference (BMVC97), vol. 2, 1997, pp. 560–570.

[36] Carter-J.N. Spencer, N.M., Viewpoint invariance in automatic gait recognition, Proc. of AutoID, 2002, pp. 1–6.

[37] Li X.-Wu X. Maybank S.J. Tao, D., General tensor discriminant analysis and gabor features for gait recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 29 (2007), no. 10, 1700–1715.

[38] J. B. Tenenbaum and W. T. Freeman, Separating style and content with bilinear models, Neural Computation 12.

[39] Fua-P. Urtasun, R., 3d tracking for gait characterization and recognition, Tech. report, Lausanne, Switzerland: Swiss Federal Institute of Technology. Tech. Rep. No. IC/2004/04, 2004.

[40] G.V. Veres, M.S. Nixon, and J.N. Carter, Model-based approaches for predicting gait changes over time, 2005, p. 213.

[41] , Modelling the time-variant covariates for gait recognition, 2005, p. 597.

[42] C.Y. Yam, M.S. Nixon, and J.N. Carter, Automated person recognition by walking and running via model-based approaches, 37 (2004), no. 5, 1057–1072.

[43] G.Y. Zhao, G.Y. Liu, H. Li, and M. Pietikainen, 3d gait recognition using multiple cameras, 2006, pp. 529–534.

[44] X.L. Zhou and B. Bhanu, Integrating face and gait for human recognition at a distance in video, 37 (2007), no. 5, 1119–1137.