Estimate 3d Arm Motion with Hierarchical Limb Model

Focusing on the problem of low computation efficiency in the process of tracking human 3D motion, an algorithm for Estimating 3D arm motion with Hierarchy Limb Model (HLM) is proposed. In our algorithm, the Hierarchy Limb Model (HLM) is proposed based on the human 3D skeleton model. Facilitated by graph decomposition, the arm motion state space, modeled by Hierarchy Limb Model (HLM), can be discomposed into low dimension subspaces. The Top-Down search strategy and the Particle Filter are used to tracking the arm motion, thus the amount of particle in tracking can be reduced. To handle server self-occlusions, the weighted color histogram and image contour are used to modeling the observation likelihood function. The result of experiment shows that our algorithm can advance the computation efficiency and handle effectively self-occlusions.


Introduction
The human 3D motion estimation has received a significant amount of attention in recent years driven by its wide applications such as video surveillance, human activity analysis, computer animation, etc.But human 3D motion Estimation is still a challenging task because of the exponentially increased computational complexity in terms of the degrees of freedom of the object and the severe image ambiguities incurred by frequent self-occlusions.
Moeslund (Moeslund, 2001, Moeslund, 2006) et al. comprehensively summarized the research results of the human 3D motion estimation.They classified the existing research pose estimation algorithms into learning-based algorithm and model-based algorithm.Furthermore, learning-based algorithm for the human 3D motion estimation can be separated into two strategies.The one is the category based on human motion prior knowledge learning.The category of solution uses strong motion prior to constrain the search into the most likely region of the parameter space (Urtasun, 2006, North, 2000).One way to cope with the high-dimensional state-space is to learn low-dimensional latent variable models.In this category, many algorithm are applied to learn the latent variable models, such as Principle Component Analysis (Sidenbladh, 2000, Urtasun, 2005, Sidenbladh, 2002), Relevance Vector Regression (Agarwal, 2004), nonlinear Gaussian process dynamic models (Urtasun, 2005) and the Partial Least Square (Xinyu, 2007), etc.The model-based category builds the human motion model with human prior knowledge and the human motion constraints, and the use of stochastic sampling techniques in model-based analysis-by-synthesis to obtain the optimal estimation based on the Bayesian network framework.As one nonlinear filter algorithm based on the Bayesian estimation framework, the use of particle filter (Blake,1998) has been widely application (Azad, 2004, Saboune, 2005) in the area of human 3D motion estimation.Deutsher (Deutsher, 2005) et al. proposed the annealed particle filter to track the human 3D motion.Markov Chain Monte Carlo (Sminchisescu, 2003, MunWai, 2006,) is utilized to solve the particle degeneracy problem.Recently, the structure graphical model (Sigal, 2004) has been used to facilitate the estimation of human 3D motion.
Although the lower computation efficiency and ambiguities have been solved effectively, the learning-based methodology can't track the random activity in natural scene but tracking the motion which has been learned.Furthermore, a great amount of samples and expensive time cost are always the challenge in learning process because of the complexity of human motion.The model-based methodology needs a great amount of sample to describe the human motion.The number of sample and time cost show exponential growth with the dimension of human motion state space.
Wu (Wu, 2003) et al. proposed a mean field Monte Carlo algorithm based on a dynamic Markov network for 2D articulated body tracking, and decomposes the human motion state space into multiple linear subspaces via the MFMC.Wei (Wei, 2007) et al. proposed decentralized articulated graphical model to improve computation efficiency based on 2D human motion tracking.With the improvement of decentralized articulated graphical model, we propose the particle filter based on the hierarchy limb model for estimating the arm 3D motion.Facilitated by graph decomposition, the hierarchy limb model decomposes the right arm motion state space into two linear subspaces.Based on the hierarchy limb model, our algorithm searches each subspace with the particle filter.As a result, our algorithm can advance the computational efficiency because of the lower dimensionality of the search space and the reduced amounts of particle.To handle efficiently handle the severe self-occlusion problem, our algorithm propose the angle relation model between upper arm and lower arm in the arm motion process.To build the angle relation model, the least square is used to fitting linear of the arm contour.
The paper is organized as follows.Section 2 describes the hierarchy limb model, and the tracking algorithm based on particle filter is proposed.The iamge likelihood function are described in Sention 3. Experimental results and analysis are shown in Section 4, and finally concludes the paper.

Frameworks
In this section, we describe the key components of the arm motion estimation framework, namely, the hierarchy limb model, and the estimation framework based on the particle filter.

Arm Hierarchy Limb Model
Our algorithm uses a generic model that represents the arm structure.The arm model, as illustrated in Figure .1, consists of two components: kinematics model and structure graphical model.

Arm Kinematics Model
Each limb of arm kinematics model includes two components: kinematics vector and shape vector.Kinematics vector consists of six parameters and is used to the prediction of arm 3D motion.Shape vector is used to describe the approximated arm 3D shape, including seven parameters.
To represent kinematics state of each limb, we define the kinematics vector as , where is global translation vector, and the rotation vecotr, , presents the angles that the limb rotate around three coordinate axises as shown Figure .1(c).The limb shape vector include three 3D cylinder constants and four shape constants in image plane.Three cylinder constants include the height of cylinder, the radius of cylinder, and the origin of local coordinate system where .The shape vector is equal to the vertex set of the quadrilateral that is the approximation of the cylinder on the image plane, as liiustrated in Figure .2. We define the shape vector as , where , .As a result, the shape vector can be defined as .
The arm state space is represented as .Where is formatted by the 3D coordinate triplets that is the ground truth of the right shoulder, is presented for the right upper arm, and is presented for the right lower arm.We denote model of each limb as follows: (1)

r O e e e e i
According to the characteristics of arm motion, the motion of any node of the arm only interacts with its children nodes.For example, the motion of lower arm is not constrained by any limbs but only the motion of corresponding upper arm.Using the arm hierarchy model, the problem of tracking right arm motion can be formulated as the prediction of at time .

Tracking with Particle Filter
Via the arm hierarchy limb model, we propose the tracking framework based on particle filter and the particle generation.

Tracking framework
Using the arm hierarchy limb model, the right arm motion can be decomposed into the motion of two parts, while the state space is decomposed into three low-dimensionality state spaces in the tracking process.Based on the decomposition, the overall state space optimization process can be formulated as the state subspace optimization of each limb following by the top-down search strategy via the Particle Filter.
The state parameter of right arm motion at time t is represented by the form of joint state as shown Eqn.3: (3) Where i is the index of parts.Assmued the father node of node defined as , the observation state of all limbs is respresented as .The posterior probability distribution for the right arm motion is given by: ( Where is defined as the prediction value of parts at time t, is the observation value of the father associated with , is defined as the maximum a posterior (MAP) for the right shoulder where is the constant.As a result, can be approximated by the following expression: (5) Where N is count of particle, K is the index of particles, is the kth particle of ith part at time t-1, is the weight value associated with and can be modified as Eqn. 6.

Particle Generation
In this section, we describe in details the praticle generation based on the arm hierarchy limb model and tracking framework.
In particle filter theoretical framework, the state transition model, by which particle is generated, is described as shown Eqn. 8. ( Where is the Gaussian noise that the expectation is a 3×1 scalar, which is defined as the motion speed of part, and the variance is the 3×3 diagonal matrix. The motion speed of part i depends on the speed of part i at time t-1 and the motion speed of its father part F(i).We respresent the motion speed of part i at time t as the row vector , where superscript of each element of vector is defined as the rotation angle speed of X axis, Y axis, and Z axis while each element of the row vector is ) ) independent.The row vector is defined as the motion speed of F(i).If t<3, is confirmed as following equation: (9) If t 3, , , can be calculated independently by Eqn. 10. (10) Where, the coefficient , , are the 2×1 scalar obtained by least squares method, which is represented as the speed coefficient vector of part i at time t-1 in the X axis, Y axis, and Z axis.

Image Likelihood Function
The observation likelihood model is represented for the matching relationship between the human appearance model and the features subtracted from the image among the particle filter theoritical framework.In this section, color distribution and image edge information are used to calculate the matching similarity between the human appearance model and the features subtracted from the image.

Color Distribution Likelihood
Color distributions are used as target models as they achieve robustness against non-rigidity, rotation and partial occlusion.The weighted color histogram, which consists of m=8×8×8=512 bins, is choosen and calculated in HSV color space to decrease the effection of the illumination.
The projection quadrilateral of the set of the limb shape vector is defined as , is the point which is the projection of the origin of the local coordinate system on image plane, and color distribution is defined as .For any pixel point , can be calculated as following expression: (11) Where is the Delta function, is the area of , is the normalized constant.The Bhattacharyya distance is used to calculate the simility between two weigthed color histograms.

Edge Likelihood
We split the arm from the background with the method combined with the background substract and skin detector.The least square is used to fit the image edge points obtained from the edge to calculate the slope of long edge of contour.
(1) Human contour is subtracted from the background by background difference, and mathematical morphology is used to distrill the whole human contour.
(2) Split the right arm contour from the human contour using the ground truth of initial frame.
(3) The point set of contour can be detected from the arm contour via the contour detection methods.The The point set of contour can be devivded into two subsets by the skin detector, including the point set of right upper arm contour, , and the point set of right lower arm contour, .
(4) is the slope of the long edge of the right upper arm by the fitting linear using least square method , and associated with the right lower arm.
Assumed at time t, is the slope of the long edge of the quadrilateral, which is the projection of the set of the limb shape vector of part i.Then can be modeled as a Gaussian distribution as following: (12) ( ), ( ), ( ), ( ), ( , , ) x y z Where i is the index of limbs, k is the index of particle, is equality to that is the slope the long edge associated with limb i, is the covariance of the slope set .

Self-occlusion
To handle with the server self-occlusion, we proposed the algorithm based on the angle relation model for two intersecting lines modeled by the method described in section 3.2.
The angle between the upper arm and the lower arm is defined as , and is the constant threshold determined by empirical value.So the posterior probability can be represented as shown Eqn. 13. (13)

Experimental Design
We have done experiments to track the right arm motion using the HumanEva data sets (Sigal, 2006) In subsection 3.3, the angle threshold is the empirical value: .

Experimental Result
Based on the parameters set in the previous subsection, we track the right arm motion using particle filter based on arm hierarchy limb model.In each experiment, the count of particle for tracking each limb is 50, 100, 150, and 200; respectively, the count of particle for all joints is 100, 200, 300, and 400.
Table 1 is the comparison of mean error, Mean, and error variance, Std. between the ground truth and the prediction value of the right lower arm under different count of particle using our algorithm in X direction, Y direction and Z direction.The Eqn. 15 is represented for mean error.The Eqn. 16 is represented for error variance. (15) In Eqn. 15 and Eqn.16, the frames of test video is described as T, and T=796. is the prediction value, is the ground truth at frame t.
From Table 1, the mean error and error variance between the prediction and ground truth have not evidently changes as the particle count of limbs increasing.Then we can draw the conclusion that the count of particle for limbs can not affect the tracking result of our algorithm.Figure .5 shows the tracking results of 3D arm motion by our algorithm as the count of particle for limbs is 400.It is no evidently different between the tracking results of our algorithm and the real pose of arm motion.

Experimental Analysis
The count of joints, which need be tracked in each tracking process, is defined as K.Each joint needs N particles to track the joint.Then our algorithm, particle filter based on arm hierarchy limb model (AHLMPF), need KN particles for all limbs and its computational complexity is E(KN).While standard particle filter generates N K kinds of combination patterns of particle in whole state space, which is formulated as N K kinds of motion states and the computational complexity of the standard particle filter is E (N K ).In our experiment, K is 2, and N will be 200, 150, 100, and 50.To track the right arm motion, the particle count of our algorithm, AHLMPF, is 100, 200, 300, and 400, while the standard particle filter will generate 40000, 22500, 10000, and 2500 kinds of combination pattern in state space.
Based on the parameters set in subsection 4.1, Table .2 show the comparison of average time for tracking one frame image between two algorithms.Table .3 show the comparison of mean error, Mean, error variance, and Std. between the prediction values using two algorithms and the ground truth in X direction, Y direction, and Z direction.
Following Table 2, the time-cost of AHLMPF is less than SPF as the particle count increasing, and the computational efficiency is improved obviously.As Shown in Table 3, the Mean and Std have not evident difference compared the ground truth with the tracking result of AHLMPF and SPF.

Conclusions
The paper proposes 3D arm motion fast tracking algorithm.Based on the AHLM, the algorithm can transfer the global optimal search of the whole state space to the top-bottom search based on the joints under the case that the dimension of state space is unchangeable.In the process of tracking, the particle count is reduced by the prediction of each joint of AHLM.The experiment shows that the tracking result using our algorithm is not evident difference compared with the standard particle filter under the same dimension of state space.
Structure Graphical ModelThe right arm can be represented by an arm graphical model such as shownFigure.3 (a).The circle nodes corresponds to a part of right arm, such as the right upper arm and the right lower arm.The square nodes are the observation values assiocated with each circle nodes.The undirected links represent physical constraints among different parts of the right arm.The directed link from a part's state to its associated observation represents the local observation likelihood.In order to describe the motion of an articulated object, we accommodate the state dynamics by a dynamical graphical model such as shown inFigure.3 (b).It contains two consecutive time frames.The directed links between consecutive states represents the dynamics translation from time t-1 to time t.

Figure. 4
(a) show the descomposion result for the right arm in Figure.3 (b), and Figure.4 (b) is the associated moral graph via the separation theorem and the charactics of the dynamic Markov network.
, which were captured at 25 fps by Leonid et al. of American Brown University using the VICON system.The experiment chooses the right arm motion color video made in the front to reduce the self-occlusions.The tracking experiments have done by Visual Studio .NET 2003 u dual-core 1.8GHz and 1G DDR memory PC.The video has 796 frames image sequence and image resolution is the 640×480.Spatial position of the right shoulder joint has not evidently change in experimental video.Then the Eqn. 5 can be simplified as the following equation: Fig )

Table 2 .
the time-cost comparison between AHLMPF and SPF under different particle counts

Table 3 .
the comparison of Mean and Std. for tracking right wrist using AHLMPF and SPF