Pedestrian Detection Based on Multi-Block Local Binary Pattern and Biologically Inspired Feature

Nowadays pedestrian detection plays an important role in security and driving assistance. Detecting moving object is complex, and some of the detection methods are comparatively ine ﬀ ective and slow. In relation to human detection it is very useful to combine independent information sources, such as appearance and motion. To achieve acceptable detection performance, we propose inter-frames di ﬀ erencing image to compute the region of interest, and MB-BIF to extract features. The MB-BIF approach combines two well-known methods, the Multi-Block Local Binary Pattern and Biologically Inspired Method. We evaluate the performance of di ﬀ erent features descriptors on di ﬀ erent databases, and our method shows good e ﬃ ciency.


Introduction
In recent years, there has been a great progress in detection systems.Lot of efforts have been made in the development of pedestrian detection systems, for its importance in practical applications such as, driving safety and surveillance systems.Detecting moving pedestrian is important in video surveillance.In general the motion detection is based on three approaches (Radke, Andra, Al-Kofahi, & Roysam, 2005), known as optical flow (Velastin, Boghossian, Lo, Sun, & Vicencio-Silva, 2005) based on the intensity change of the moving objects, background subtraction (Piccardi, 2004) based on the difference between the given frame and the background model, frame differencing (Kim & Hwang, 2002) based on simple consecutive frames difference.These approaches treat pedestrian detection as two categories: human and non-human classification problems.There is a need for an automatic system that will detect a person in a complex scene; this step is crucial for human movements.Although a significant number of approaches have been developed, the detection of pedestrian is far from being put in place (Sharma & Davis, 2007).The complexity of human pedestrian detection is due to the wide variation in complexion, backgrounds and environmental conditions.Therefore, a big concern is given to the implementation of a system capable of differentiating humans from a captured image with high accuracy, precision, and efficiency.
Pedestrian detection has been extensively studied, significant work has been devoted on detecting, locating, and tracking people in images and videos.Stereovision was the first approach used for human detection in the late 1990s based on Fundamental Matrix algorithm (Franke & Kutzbach, 1996).In 1998, Heisele and Woehler developed an algorithm based on the movement of legs compared to the ground to detect and classify individual (Heisele & Woehler, 1998).Haar-like features and support vector machine (SVM) is proposed in the same year (Papageorgiou, Oren, & Poggio, 1998).In year 2000, researchers started taking advantage of the face detections algorithms.In 2001, due to the face detection success, the human detection work speeded up, many works are based on face detection technics such the Viola and Jones method which gives a good result (Viola & Jones, 2001b).Three year later, LBP (Local binary patterns) is introduced in the same area of face detection (Ahonen, Hadid, & Pietikainen, 2004).In 2005, HOG (Histogram of Oriented Gradient) algorithm is introduced by the researchers of INRIA (Institut National de Recherche en Informatique et en Automatique), it has been proven to be efficient (Dalal & Triggs, 2005).In 2006, credit was given to the covariance descriptor (Tuzel, Porikli, & Meer, 2006).In 2007, the shape features is introduced to describe local pieces (head, shoulder, and body) (Sabzmeydani & Mori, 2007).In 2008 researchers at Rutgers University improved the work by using the covariance matrix as descriptor (Tuzel, Porikli, & Meer, 2008).A significant performance is obtained by combining different pedestrian feature extraction approaches and classification algorithms (Paisitkriangkrai, Shen, & Zhang, 2008).
The key step in image recognition and video analysis is the feature extraction from the region of interest (ROI).This paper proposes a pedestrian detector inspired from human visual system with a hierarchical architecture topographical.The hierarchical architecture topographical maps the input image to the feature-extraction layers.Where, the last layer performs the classification as pedestrian or not.In order to solve the problems encountered in standard moving object detection methods we proposed a Fast determination of ROI and MB-BIF features.
The paper is organized as follows: Section 2 mentions the related work.Section 3 presents the framework of our approach.Section 4 describes the detection procedure.Section 5 analyzes the experimental results and Section 6 draws conclusion.

Related Work
Pedestrian detection has gained an extensive amount of interest in computer vision community over past years.
Here we present some motion detector approaches, Broggi, Fascioli, Carletti, Graf and Meinecke (2004) used multi-resolution texture symmetry, edge symmetry and edge density ROI extraction.In Liu and Fujimura (2004), strategies are to apply intensity threshold, and the motion constraint.In Xu, Liu, and Fujimura (2005), intensity thresholds are used with the combination of support-vector-machine classifier and Kalman-Filter.The sliding window approaches are more promising when the image resolution is low.The first sliding window was proposed by Papa Georgiou, it was applied with the support vector machine to an over-complete dictionary of multi-scale Haar wavelets (Papageorgiou & Poggio, 2000).The histogram of oriented gradient (HOG) introduced by Dalal and Triggs (Dalal & Triggs, 2005) gained popularity over intensity based features.The Haar-like features proposed by Viola and Jones (Viola & Jones, 2001a), achieved a real-time performance by introducing integral image and multistage classifiers structure.The combination of many features increases the detector efficiency, (Wu & Nevatia, 2008) combined HOG, edgelet and covariance features.The multilayer network is widely used in pattern recognition, such as face detection (Garcia & Delakis, 2004), handwritten digit recognition (LeCun, Bottou, Bengio, & Haffner, 1998), facial expression analysis (Fasel, 2002).Hubel and Wiesel (1965) introduced Neural Network ideas, where local features are extracted following certain hierarchical structure.The network architectures (Fukushima, 2007) is made of many stages that contain different layers in turn, called the S-layer (simple layer), the C-layer (complex layer) and the V-layer.Dollar, Wojek, Schiele and Perona (2012) present a full understanding of the pedestrian detection.This work introduced a new region of interest determination, efficient features extraction, and powerful classification.

Our Approach
Figure 1 presents our system overview, the input video and images are preprocessed using video denoising (Zuo, Liu, Tan, Wang, & Zhang, 2013) and image denoising (Gupta, Mahle, & Shriwas, 2013) respectively before being convert into image frames.Section 3.1 describes how the ROI is computed from consecutive frames, section 3.2 presents our novel features descriptor based on Multi-block Bio-Inspired Features (MB-BIF), and section 3.3 indicates the classification process.

ROI Determination
In normal pedestrian detection, the detector searches all possible sub-windows of an image to find whether they contain a pedestrian or not, but this technique is computationally expensive.Our ROI extraction will therefore narrow down the searching region.Differencing Image technique detects consistent and reliable region of interest with bright intensity contrasts.Figure 2 presents the flowchart of the region of interest computation process.
Comparing two adjacent frames we can detect the dissimilarities and extract features from their differences.Taking two frames at time T , and T + t, using point A ( x, y ) we can compute the differencing image as:

Multi-Block Bio-Inspired Features (MB-BIF)
In this sub-section, we introduce the MB-BIF descriptor for pedestrian detection.The MB-BIF framework contains four layers.The L1 layer is built of filters, the L2 layer extract nonlinear features, L3 layer uses MB-BIF to extract features, L4 layer reduces the dimension and improves the discrimination power of features.

L1 Layer
Gabor filter focus on local areas to extract information of precise scale and orientation, with certain orientation and scale it yields edges and bars along the direction, and extract facts in the corresponding frequency band.Therefore, Gabor filter extracts added facts in some significant body areas, which are very useful for pedestrian representation.
Here we analyze the input image by an array corresponding to the primary visual cortex (V) cells.Due to its good cortical model (Guo, Mu, Fu, & Huang, 2009), the Gabor filter is applied in L1.Considering the image I (x, y) with width w and height h, the convolutional equation is given as: )[e ik u,v z − e −σ 2 2 ] and k µ,ν = k ν e iϕ µ , k ν = 2 − ν=2 2 π, ϕ u = u π 8 , u = 0, 1..., U − 1, v = 0, 1..., V − 1, Where µ and ν are the scales and orientations parameters, respectively.There are 8 bands (16 scales) in normal bio-inspired features, 2 filters for each band with variable number of orientations (4,8,16).Most detection and recognition systems applied 128 Gabor Magnitude Pictures (GMPs) with 8 bands and 8 orientations, here we use the same strategy.The GMPs is: {G q i j ; i = 0, ..., m; j = 0, ..., n; q = 0, 1} where G 0 i j And G 1 i j are filtered values at band i and orientation j, m is number of bands, n is orientations number, q is the scale index.Taken an image size of 64 × 128, we get 8192 dimensions from each GMP, and the output dimensions of L1 layer is (64 × 128 × 16 × 8) = 1048576.

L2 Layer
From the previous layer output, a maximum scale is filtered out from two consecutive scales and computed as: L2 i j the maximum value of two consecutive L1 units at band i and orientation j .the output of the L2 layer in turn is computed as: L2 = (L2 11 , L2 12 , ..., L2 1n , L2 21 , L2 22 , ..., L2 mn ) (4)

L3 Layer
Here, we introduce the Multi-block Local Binary Pattern Bio-inspired Features (MB-LBP).MB-LBP features are extracted instead of common Haar-like features, rapidly calculated through the integral image.The extracted features are few with rich information and distinctive performance.Here the simple difference method in Haar-like features is replaced by encoding rectangular region by local binary pattern operator.The original LBP operator considers measurements from a 3x3 pixel square (Ojala, Pietikainen, & Harwood, 1996).The binary code is computed for every pixel by thresholding the 3 x 3 neighborhood pixel value with the center value and taking the result as a binary number.The multi-block LBP follows the rule of LBP operator, by encoding rectangle areas, MB-LBP operator : Where g c is the central rectangle and g i (i = 0 . . ...8) are neighborhood rectangles.
As the original Local Binary Pattern, MB-LBP can capture a bunch of structures, 256 kinds of binary patterns are computed but not all of them are used to represent body shape, see Figure 5.  (Crow, 1984).As the name is saying, the value at any point (x, y) in the summed area table is just the sum of all the pixels above and to the left of (x, y), only.
The computation of the summed area table is performed in a single pass over the image, using the fact that the value in the summed area table at (x, y) is just: After the summation, evaluating any rectangle can be accomplished in constant time with just four array references as seen in Figure 6.
Figure 6.Finding the sum of a rectangular area

L4 Layer
The feature vector of each pedestrian image is formed using the concatenated L3 features.L4 layer applied Principal Component Analysis (PCA) to reduce the dimension of the resultant features and enhance the discriminative ability.In PCA, the projection matrix W is composed by the orthogonal eigenvectors of the covariance matrix of all the training samples.To further improve the dimension reduction, we turn to the feature vector partitioning.
The output L of the L3 layer can be rewritten as: where H n is the n th feature segment with a specific number of spatial histograms.For each feature segment H n a PCA model is computed for low-dimensional representation in the subspace: Thus, in MB-BIF, by building k subspaces, the input pedestrian image is finally represented as:

Enhanced Fisher Linear Discriminant Analysis
Introduced by Fisher (1936) ensures the optimal separation in the new subspace by searching the optimal projection vector W opt from n samples in the mapped 1-D domain.T is the between classes scatter matrix.The between classes function characterizes the discriminant structure of data, it makes the data points of different classes as distant as possible.T is the within classes scatter matrix.The within classes function characterizes the similarity between data from the same class, it makes the data points from the same class as close as possible.Where, Φ is n dimensional vector, u i is the mean of the i class, c the number of the classes, n i is the quantity of samples of the i class and x k the k − th sample, x is the total sample mean of the vector.When W opt is maximum Φ is identically to the optimal projection vector [V, D] = eigS −1 W S B .Where V and D represent the eigenvalue and the eigenvector of S −1 w S B respectively and W opt computed as D correspond to maximum V .The fisher classifier is: W opt is the optimal projection vector, x is the input sample, θ = n 1 u 1 +n 2 u 2 n 1 +n 2 is the threshold, n 1 is the number of samples with class i , u 1 is the mean of samples with classes i .The within classes function is prone to map some points, that are not very close to each other, to a subspace in which they are very close to each other.This indicates that the variation of the values of data from the same class is impaired and lost in the reduced space space (Weinberger, Packer, & Saul, 2005).The enhanced discriminant analysis incorporates the variation of data from the same class into the Fisher criterion to avoid the over-fitting problem in the reduced space (Gao, Liu, Zhang, Hou, & Yang, 2012).The discriminant feature extraction criterion can be rewritten as: Where S D stands for variability scatter matrix, ε is a constant used to control the balance between discriminative ability and stableness of the algorithm.

Pedestrian Detection
This section describes the proposed approach for pedestrian detection.It starts with the ROIs, follows by MB-BIF, and ends with EFLDA.A pedestrian is considered as detected if the candidate image pass the entire test.

Dataset
We evaluate our work on the TUD-Brussels dataset, Caltech pedestrian benchmark.The TUD-Brussels contains pedestrians from various scales and viewpoints.It comes with 508 images of 640 x 480 pixels, 1326 annotated pedestrians, and 192 negative images.The Caltech Pedestrians is largely publically available pedestrian dataset.It offers a large number of samples.The annotated dataset is near to 2.5 hours of video captured at 30fps.It contents: 11 segmented sessions, 6 for training (S0 to S5) and 5 for testing (S6 to S10).The individual in these datasets appear in many positions, orientations, and background variety.

Region of Interest
The regions of interests are computed to speed up the detection system.The strategy consists to filter out unwanted image regions.Using differencing image histogram, pedestrians positions are computed on significant motion (presented by closest picks in histogram).

The Features Extraction
For MB-LBP feature, the image model of (Kuo, Yang, & Yen, 2012) serves to construct an effective auxiliary in the determination of a MB-BIF feature extraction frame.

Detection Process
In the preprocessing step the region of interest is located.The detection window is applied to the candidate region given by the preprocessing step, features are extracted and then submitted to the classifier for test, will be considered as pedestrian the result that passes all the steps.The information concerning that window is recorded to mark the region of detected pedestrian.

Experiment and Results
This part presents the experimental results and analysis.We ran the experiment on 2.2 GHz Intel Core 2 Duo T6600 with 4 GB of memory using Mat lab Release R2011b.Figure 8 presents features a performances result, MB-BIF approach shows a good performance, followed by region covariance descriptor.The performances are evaluated using detection rate versus false positive rate.Figure 9    Table 2 presents the processing speed of our approach on our recorded video, the image size of 720x480 has a speed of 15fps, the image size of 640x480 has a speed of 22 fps, and the image size of 320x240 has a speed of 71fps.The whole system worked with an average speed of 33 fps.We also noticed that the detection time increases with the size of the image as well as with the number of pedestrians in the image.∼ 71 fps

Conclusion
Motivated by the success of MB-LBP and BIF, we used them for the moving pedestrian detection.Firstly, the ROI determination is filtering out similar frames and unwanted sub-windows to reduce the computational time.
Secondly, the combination of the MB-LBP and the BIF extract features from candidate region.Thirdly, a high discriminative classification is obtained using EFLDA preceded by PCA features reduction.The experimental results demonstrated the high performance and effectiveness of our approach.Our approach is fast enough to analyze high video frame rate using less memory.Extensive experimental results illustrate the advantages of the proposed method.
Although the person detectors techniques have proven a reliable and good detection, there is still place for real time detection improvements.In the future work, we wish to extend the current approach for handicapped pedestrian detection, and include some novel classifiers and feature descriptors to optimize the detector.

Figure 5 .
Figure 5.Samples of MB-LBP features Figure 7 shows a pedestrian featured image of 64 × 128 pixels segmented into 8 × 16 square cells in 8 × 8 pixels.The model is built as the average pixel values in each square cell.The MB-LBP filters are used to recognize the distinctive features and their position.

Figure
Figure 7. Pedestrian model

Figure 8 .
Figure 8. Performance evaluation presents the detected pedestrians in red color bounding box.

Table 1
presents the MB-BIF average result using TUD-Brussels Pedestrian and Caltech Pedestrian dataset.TheVol.7,No. 1; 2014detection is counted as correct, if it overlaps the annotated by 70% at least, using intersection-over-union measurement PC(r, o) = |r ∩ o|/|r ∪ o| .

Table 1 .
Accuracy of our detection system

Table 2 .
The processing speed of our system