Objects Movement Detection Based on Impulsed Neural Network of the Analogue of the Biological Retinal Model

This paper describes a neural network model based on the biological analog of the impulse neural network of the retina, which makes it possible to identify movement objects in a video image and a motion detector based on the retinal operation. The proposed detector is an alternative to detectors based on deterministic methods and traditional neural networks. It requires less computational resources at the same video image processing speed.


Related Works
Early moving objects detection is based on the background subtraction methods and noise minimization on the image [Maddalena, L. & Petrosino, A., 2008/2012/2014. P. Gil-Jimenez in his research [Gil-Jimenez, P. , Maldonado-Bascon, R. & Gil-Pita, H., 2003] proposed using the classification capacity of a neural network for decreasing the false detection probability. Instead of thresholding the difference between the current frame and the reference, as it is the typical method, the system first classifies each zone of the image depending on its observed behavior, and so, perform the motion detection according to this classification. The main drawback of the proposed method is the high computational cost of the classification for every pixel. Moreover, the background behavior is supposed not to change frequently.
Z. Wang developed a new cooperative background model for multi-modal video surveillance based on a probability neural network [Wang, Z., Bao, H. & Zhang, L., 2009] [Wang, Z. & Bao, H., 2011]. Firstly, the probability of being foreground is estimated in the visible and infrared channel, and post-processed separately. Then, every pixel is classified into foreground, background. The change pixels by fusing this information, and foreground pixels are segmented into motion regions. Thirdly, the adaptive learning rate is computed for every frame and every pixel based on frame motion difference and pixel classification result, and the background model for every channel is updated.
In 2006 D. Culibrk proposed a novel neural network based on an approach to background modeling for motion-based object segmentation in video sequences [Culibrk, D., Marques, O., Socek, D., Kalva, H. & Furht, B., 2006]. The proposed approach is designed to enable efficient, highly-parallelized hardware implementation. Such a system can achieve real-time segmentation of high-resolution sequences. The basis of the approach is the employment of a novel neural network structure designed specifically to serve as a model of background in video sequences and a Bayesian classifier to be used for object segmentation. The new Background Modeling Neural Network is an unsupervised classifier. L. Maddalena and A. Petrosino in their research [Maddalena, L. & Petrosino, A., 2008/2012/2014] developed SOBS (Self-Organizing Background Subtraction)) algorithm 3dSOBS+. This SOBS algorithm implements an approach to moving object detection based on the neural background model automatically generated by a self-organizing method, without prior knowledge about the involved patterns. Such an adaptive model can handle scenes containing moving backgrounds, gradual illumination variations, and camouflage can include into the background model shadows cast by moving objects, and achieves robust detection for different types of videos taken with stationary cameras. Moreover, the introduction of spatial coherence into the background update procedure leads to the so-called SC-SOBS algorithm, which provides further robustness against false detections.
In [Ramirez-Alonso, G. & Chacon-Murguia, M., 2015] [Ramirez-Quintana, J., Chacon-Murguia, M. & Ramirez-Alonso, G., 2018] authors presented a video segmentation algorithm that takes advantage of using a background subtraction (BS) model with a low learning rate or a BS model with a high learning rate depending on the video scene dynamics. These BS models are based on neural network architecture, the self-organized map (SOM), and the algorithm termed temporal modular self-adaptive SOM, TMSA_SOM. Depending on the type of scenario, the TMSA_SOM automatically classifies and processes each video into one of four different specialized modules based on initial sequence analysis. This approach is convenient because, unlike state-of-the-art models, the proposed model solves different situations that may occur in the video scene with a specialized module. Furthermore, TMSA_SOM automatically identifies whether the scene has drastically changed and automatically detects when the scene has become stable again and uses this information to update the background model in a fast way.
All methods and approaches considered above are aimed at finding the background, eliminating noise, and frame segmentation. The disadvantage of such approaches is the lack of information about the movement of the object in the image coming to the processing unit in real-time. The processing unit receives a pixel matrix of the image without any additional information and the entire load of the analysis of the current frame in real time is transferred to the computing unit/ controller ( Figure 1). In other words, the computing unit can be represented as the human brain into which the frame image was inserted. The brain should analyze this image based on the history of previous frames and processing results. What's more, the brain doesn't know how this image is obtained: from the flow of a video camera at the moment, from watching a movie, from a fragment of a dream, or it's just a mirage or imagination. The brain gets a matrix of numbers. The computing center for finding moving objects must extract the background from the frame. In addition, further processing of the frame is required to remove noise and interference (illumination changes, waving trees, cast shadows, moving background, camouflage, etc.) with the following regions segmentation.  The light flux reflected from the objects falls on the retina receptors (described by the impulse neural network model), which record the energy/pulse change in the spike neurons (each of which corresponds to the pixel of the image frame) and generates an output matrix of impulses changes, i.e. the pixel matrix in which the energy of the impulse signal was changed (the matrix of movement objects). In this way, the computing center (brain center) obtains a matrix of all moving objects directly from the sensor retina (impulse neural network model) without computing the background and subtracting the background from the current image frame. The computing center in this case detects moving objects according to the patterns and removes the values of impulse signals caused by light changes, waving objects by wind, casting shadows, dynamic background, etc.
The originality of the proposed method (Functional scheme depicted in Figure  Detection and recognition of the moving object, as well as the elimination of false alarms and removal of noise, will be carried out in the human brain (brain center, computing center, controller) based on the association of patterns of the shape of objects, paternal characteristics of the movement of objects and imagination derived from previous life experiences. The impulsed neural network (the retina of the eye) is designed only to detect movement by reacting the retina to the energy of light reflected from the objects around. An impulsed neural network does not detect a specific type of object or determine its shape. For this purpose, additional models of computation in the brain center are needed. For example, a sudden change in the lighting of a room (light switch on/off) will be identified as the movement of the whole environment around the observer. Therefore, the application of impulse neural networks requires the development of additional special methods and tools for the selection and detection of objects and the elimination of noise. The retina of the eye serves as a sensor-indicator of movement objects, color/brightness change, etc.
The advantage of the proposed approach is that it is not necessary to calculate the background matrix of the image and subtract it from the current image frame. In this work, the modeling of the impulsed neural network (retina) and the possibility of using this model for the indication of movement objects on the image have been described.
Both functional detection schemes (Figures 1, 2) require the elimination of noise in the computing center/ controller. But the amount of computation in each scheme will be different. The use of impulse neural networks makes it possible to reduce the amount of computing in the controller. However, this approach requires the use of retina modeling (a special sensor based on the analog of the biological impulsed neural network of the retina).
This work focuses on motion indication and the development of motion sensors, but not on the detection of the type of moving object as such ones. The retina of the eye does not detect the type of moving object but sends a matrix of changes of the light luminosity reflected from the objects around.
Technologies and methods based on impulse neural networks modeling the retina are related to unsupervised methods. It is possible to adjust the sensitivity threshold of the impulse/energy change of the neuron and parameters of the models of impulsed neuron and impulsed neural network. All sensors are by nature designed to send the measured signal value to the controller and are built on unsupervised methods of functionality.

Indicating Movement Objects Using Impulse Neural Networks
It is easy for a person to quickly isolate moving objects. But behind this skill lies a rather complex system of processing visual information -the retina of the eye [Masland, R.H., 2001] [Wassle, H., 2004]. The latter consists of complex chains of neurons, with photoreceptors (photosensitive cells) at the front of the neurons, which directly receive the optical signals and convert them into physiological excitations. The excitement from photoreceptors is transmitted via interneurons, or insertion neurons, which synaptically communicate with each other and bind photoreceptors to retinal ganglion cells that send signals through the optic nerve further into the brain. Different cell types are responsible for processing different image characteristics: brightness, color, movement of objects, etc. Under the influence of an external stimulus, whether a light signal or a signal from a neighboring neuron, the current neuron begins to emit pulses of the same amplitude. And the stronger the external impact, the more frequent the signals. Work [Olveczky, B., Baccus S. & Meister M., 2003] shows how a network of ganglion cells can instantly detect a moving object and even emit several such objects. The results of the research presented in the works [Masland, R.H., 2001] [Wassle, H., 2004] [Olveczky, B., Baccus S. & Meister M., 2003] [Maass, W., 1997] reveal the principles of retinal operation when separating moving objects from the point of view of physiology. But is it possible to use this information to create artificial neural networks that can isolate moving objects just as quickly and accurately?
The works of [Wu, Q. 50 2020]. The output of a neuron is composed of short electrical impulses (also called potentials or spikes). The shape of the impulses does not change when transmitted by the axon. A chain of active potentials caused by a single neuron is called a pulse sequence, a series of identical events occurring at specific or random moments in time. Since all generated impulses are of roughly the same shape, the information is not contained in the form of impulses, but in their number and the exact timing of their occurrence.

Model of an Impulse Neural Network for Detecting Moving Objects
The general structure of the impulse neural network used to isolate moving objects and used in the detector being developed is shown in Figure 3. ℎ accordingly, the interneuron 1 ( , ) will be at rest. The situation is similar for interneuron 2 ( , ). If the current of the receptor ( , ) increases, i.e.
( , , ) > ( , , − Δ ) , the balance is disturbed since the signal coming from the excitatory synapse is stronger than the signal delayed for a period Δ coming from the inhibitory synapse. The interneuron 1 ( , ) begins to generate impulses (spikes). If the current of the receptor ( , ) decreases, i.e. ( , , ) < ( , , − Δ ), the interneuron 1 ( , ) does not react, but interneuron 2 ( , ) starts to generate impulses, since the delayed signal (for a period Δ ), flowing from the excitatory synapse is stronger than the signal from the inhibiting synapse. In other words, the neural network begins to react to the variation in pixel brightness that can be caused by the passage of objects moving over a static background. The output layer of the the neural network has the same dimensions as the input layer and the hidden layer. Each neuron of a given layer ( , ) corresponds to each pixel ( , ) of the output frame of the video image. The interneurons 1 ( , ) and 2 ( , ) are connected by excitable synapses without delay to the output neuron ( , ). It produces signals only when it receives impulses from the interneuron 1 ( , ) or 2 ( , ), otherwise, it is at rest. The magnitude of the grayscale of each pixel ( , ) of the output video frame is proportional to the frequency of impulse generation by the output neuron ( , ) and has a value of 0 (black) if the output neuron ( , ) does not generate any signals over a certain time period T. Otherwise, the brightness of the pixel ( , ) will be above 0 ( Figure 4).

Figure 4. Selecting a Moving Object by Impulse Neural Network
The result of the simulation of the impulse neural network for the ideal case with a static background and without noise is shown in Figure 2. The result of this simulation demonstrates the capabilities of an impulsed neural network for educational purposes to explain the mechanism of impulsed neural networks in the detection/indication of movement objects.

Impulse Neuron Model
There are different models of [Gerstner, W. & Kistler, W. , 2002] impulse neurons: Hodkin-Huxley, «generalization-response» (Integrate-and-Fire), impulse response (Spike Response Model), etc. The most detailed and complex is the Hodkin-Huxley model [Muder A., Alia A., Amer A., Saleh A. & Abdul, R., 2020]. It's based on an experimental study of a large number of squid neurons. The differential equation system of this model describes the exact reaction of the potential of the neuron membrane in response to different inputs. However, this realism results in high computational costs, and the model is not very suitable for experiments with neural networks composed of large numbers of neurons, as in this case.
Based on the work of [Wu, Q., McGinnity T., Maguire L. & Cai J., 2008], the motion detector being developed uses the model of the neuron «generalization-response» (integrate-and-fire, IaF), which is simpler in mathematical description and quite efficient.
The IaF model treats pulses as short-pulsed currents. Once the impulse arrives at the synapse, all associated http://cis.ccsenet.org Computer and Information Science Vol. 15, No. 4; post-synaptic neurons are immediately charged. This voltage change is called post-synaptic potential. Once the potential of the neuron's membrane reaches the threshold value, it is reset and a new impulse is generated. Let , ( ) be the grayscale brightness for a single pixel ( , ) of the input image at time , , ( ) is the conductivity of the excitable synapse from the receptor ( , ), , ℎ ( ) is the conductivity of the inhibiting synapse from the receptor ( , ), then the brightness transformation formulae of the grey scale will take the form of: where and some conversion coefficients. According to [Wu, Q., McGinnity T., Maguire L. & Cai J., 2008], the pulsed neuron 1 ( , ) can be characterized by the following equations: 1 ( , ) ( ) = ( − 1 ( , ) ( )) + 1 ( , ) 1 ( , ) ( ) ( − 1 ( , ) ( )) + where 1 ( , ) ( ) and 1 ℎ ( , ) ( )the conductivity values of the membrane, respectively, of the excitatory and inhibiting synapses connecting the neurons ( , ) and 1 ( , ), and ℎcharacteristic synaptic time of excitatory and inhibiting synapses respectively (usually 2 ms), synaptic delay in the transfer of impulse from neuron ( , ) to neuron 1 ( , ), 1 ( , )neuron membrane potential 1 ( , ), equilibrium potential of the neuron membrane, neuron membrane conductivity value, и ℎequilibrium potentials values of excitatory and inhibiting synapses respectively, the surface area of the neuron membrane 1 ( , ), connected to the excitatory synapse, ℎthe surface area of the neuron membrane 1 ( , ), connected to the inhibiting synapse, specific capacity of the neuron membrane, 1 ( , )the potency of the excitatory synaptic bond between receptor ( , ) and interneuron 1 ( , ), 1 ℎ ( , )the potency of inhibiting synaptic bond between receptor ( , ) and interneuron 1 ( , ).

Conclusion
The above approach for detecting and separating moving objects is an attempt to simulate the ability of the human eye to isolate moving objects quickly enough and surpass existing deterministic methods in terms of the selection speed of moving objects and economy computing resources. The motion detector has been developed on the basis of this approach, as a software module, which can be used in the field of digital video image processing (motion indication sensor is an impulsed neural network simulating the retina of the eye). It is intended to use this detector in automated traffic management systems as an alternative to existing detectors, even taking into account the possible improvement of the latter through the use of parallel computations for simultaneous processing of video segments and selection of moving objects within each of them [Sadhukhan, P. & Gazi, F., 2018]  . The main problem of using this type of detector in traffic management systems is to eliminate noise because any small change in the energy/brightness of the pixel of an image will be fixed as a moving object. Further research will be related to the optimization of parameters of the impulsed neuron model and the impulsed neural network to increase the accuracy and stability of the process of detecting the movement of target objects and prevent false alarms of the movement objects being created by changing lighting, shadows casting, dynamic background, weather conditions, disguise the object, etc. In addition, further research is planned on the optimal selection of the threshold of sensitivity of the brightness/energy of the motion detection impulse neuron to reduce motion false alarms and decrease the number of calculations, as well as the impact of this threshold of sensitivity on the quality and stability of the detection of moving objects.
The disadvantage of impulse-based sensors is the difficulty of detecting motion if the moving object and the background of the image have the same or similar color. In this case, the reflected light from moving objects and the background of the image will have approximately the same brightness, which creates serious problems for detecting the possible motion of objects. Increasing the threshold of sensitivity to the impulse neuron can lead to excessive noise in the image. To solve such problems, it is intended to develop an optimal segmentation of the image to regions by color/brightness and increase the sensitivity threshold in the area of the borders of these contours with the following experiments on the CDnet 2014 dataset (http:///changedetection.net/) and on the BMC 2012 dataset (http://backgroundmodelschallenge.eu/) both quantitatively and qualitatively.
It should also be borne in mind that impulse neural network elements can be implemented in hardware [Dinu, A., Cirstea, M. & Cirstea, S., 2010], or software using modern parallel computing technologies based on graphics processors [Gonzalez, R. & Woods, R., 2018] [Appleyard, J., Kocisky, T. & Blunsom, P. , 2016] [Weninger, F., Bergmann, J., & Schuller, B., 2015]. This can significantly accelerate the selection of moving objects in video images, although it may be costly to implement and pre-configure the motion detector.
Innovation Project BK-202206 "Research on Intelligent Computer Integrated Manufacturing/Industrial control and monitoring. Methodology Development of Innovative Human-Machine Interfaces" and partly contributed by OYGJS-2021002 "Application research and development of a distributed photovoltaic power generation system" (School of Information Engineering, Xi'an Eurasia University, Xi'an, China) in order to develop innovative Human-Machine Interfaces, training equipment and its intelligent control system based on computer vision and to integrate them with IoT (Internet of Things) and IIoT (Industrial Internet of Things).