Approach of Rsor Algorithm Using Hsv Color Model for Nude Detection in Digital Images

This paper analyzes the application of pixel segmentation techniques, the recognition and selection of image regions, as well as the performing of operations on the regions found within the digital images in order to detect nudity. The research aims to develop a software tool capable of nudity detection on digital images. The segmentation in the HSV color model (Hue, Saturation, and Value) to locate and remove the pixels corresponding to human skin is used. The algorithm in Recognition, Selection and Operations in Regions (RSOR), to recognize and separate the region with the highest number of skin pixels within the segmented image (largest region), is proposed. Once selected the largest region, the RSOR algorithm calculates the percentage on the segmented image taken from the original one, and then it calculates the percentage on the largest region, in order to identify whether there is a nude in the image. The criteria for appraising if an image depicts a nude is the following: If the percentage of skin pixels in the segmented image, in comparison to the original image, is less than 25% it is not considered a nude, but if it exceeds this percentage, then, the image is a nude. However, when the percentage of the largest region has been estimated and it amounts to less than 35%, the image is definitely not a nude. The final result is a message that informs the user whether or not the image is a nude. The RSOR algorithm obtains a 4.7% false positive, compared to other systems, and it has shown to possess optimum performance for nudity detection.


Introduction
An Automated processes for nudity detection in digital images is a research topic focused on building tools that help preventing pornographic material traffic through the Internet, where child pornography is a criminal activity that uses the Internet as the main distribution source.The internet facilitated the access of pornography.Approximately, 5 million Internet pages promote and distribute child pornography.The amount of child pornography on the Internet is estimated to be more than 6 million different images.Just the British specialized police that fight against child pornography hold a base of approximately 3 million photographs.In addition there exist pornographic videos, stories, and other ways of child pornography.The U.S. is the biggest producer of pornography on the Internet, followed by South Korea.In Latin America; Brazil and Argentina are emphasized.Crimes related to dissemination, distribution and sale of child pornography on the Internet report over 50% of the crimes committed on the Network.This was revealed during the XVII INTERPOL meeting.
Companies like Alia2, developer of the tool "Carolina", or the NetClean Technologies Company that created "ProActive", are developing software tools that help detecting, tracking and preventing the traffic of child pornography on the Internet.The Carolina project, mentioned in Larazon.es, is a software that detects child pornography contents within an Internet site, and blocks the downloading of pedophilic contents and notifies the authorities.These are small plug-ins to be added within the browser, and once installed on a computer; they detect all children pornographic contents.This is achieved by comparing the contents (videos and images); from the file to be downloaded with the database that has specific codes of the contents.
ProActive is a software referred to in PCWorld Magazine that uses a database of approximately 400,000 images, and videos collected by the Swedish police.The NetClean ProActive Technologies system creates a digital signature for each image, using a similar method adopted by the antivirus companies.The system analyzes digital images and digital signatures; if the image or the signature is found within the central database, the user is notified of having accessed to pornographic contents, and the image is blocked.The software monitors the external drives of the computers that are generally used for the distribution of child pornography.NetClean can detect the statistically rare and extremely serious crime of using a business network to distribute child pornography.This is its main strength.
The main objective of this research is nudity recognition in digital images, using the pixels from images, as the information source to determine whether the image represents a possible nudity that corresponds to human skin color, employing several methods such as pixel segmentation in digital images (Johnson, Lok, Wong and Da Silva, 2007, AP-ApID 2005, M. Fleck, Forsyth and Bregler, 1996, Mahjoub 2010, Forsyth and M. Fleck., 1999), the most relevant are those that correspond to human skin color, which in this case is the HSV model.To reduce the range in skin pixels with the HSV model, it is necessary to use HSV segmentation techniques, and the process to separate the image pixels within the interval for human skin color.As an improvement to HSV segmentation, the RSOR algorithm approach is used to decide whether the processed image contains nudity or not.The RSOR algorithm is the combination of image processing techniques such as techniques for the recognizing of the regions.It was implemented to recognize the regions in the segmented image.The region selection techniques are brought about in the recognition of the largest region for estimating the percentages of skin in the segmented image, and within the largest region, for finding nudity in the analyzed image.
(Insert Figure 1 here) Figure 1 displays the processes and subprocesses that follow the digital image analysis.The blocks with a solid line illustrate techniques that have already been implemented for nudity detection in digital images, and are employed to obtain information sources to the final result.This final result is an outgoing message that informs whether or not, the image represents nudity.In the dotted lines, the RSOR algorithm can be seen as a part of the process that follows the image for the second information source, considered in the generation of the final result, which is the largest region.
In order to obtain the primary information source, it is important to apply techniques such as pixel segmentation based on HSV color model for recognizing and separating skin pixels from the original image.To make the grouping of human skin color in the HSV space, it is necessary to use an assorted set of images of Caucasian, Asian, and African skin examples, to create sample image for nudity recognition in digital images (Johnson et al., 2007, Ap-apid, 2005).Once separated, the recognized skin pixels collect information that is stored in a new digital binary image.The second information source is obtained by applying the recognition and selection of the largest region as an initial part of the RSOR algorithm that performs the regions recognition in the segmented image and selects the largest region, storing it in a new binary image.
Finally, the RSOR algorithm applies optimization operations to calculate the percentages from the information sources to get the final result since the RSOR algorithm performs the function of deciding whether the information obtained from the segmentation, and the largest region represents a possible nudity in the original image.The first partial result is obtained by calculating the percentage of pixels contained in the first information source obtained from the original image, resulting in the first approximation to the final result.The second partial result is obtained by calculating the percentage of pixels contained in the second information source, regarding the primary information source, resulting in the second approach to the final result.Once the partial results are obtained, it can be decided whether the processed image represents a nudity; assessing the second approach as that with the greater weight in the generation of the final result (Asaf, Vargas, Rueda, Bulkan, Chen and Hung, 2006, Ap-apid, 2005, Thompson, 2009).
The research proposes the use of techniques for nudity detection in digital images that help to determine the degree of pixels of the human skin color that represent a potential nudity.Furthermore, some strategies are proposed to improve detection that could be found in other existing systems.

Image Segmentation using HSV model
The first step to determine whether the image represents nudity is the separation of skin pixels detected in the original image and using the pixels segmentation of the HSV model.

(Insert Figure 2 here)
Figure 2 shows the sequence that follows the HSV segmentation process.The first step is to analyze the sample image which contains skin segments and creates the sample vector (vector with the highest concentration of skin pixels in HSV model), then, the original image is segmented by using the sample vector.

HSV Model
The HSV model defines a color model in terms of his components in cylindrical coordinates:  Hue, the color type (such as red, blue or yellow).It is represented in degrees whose possible values range from 0 to 360 ° (although for some applications they are normalized from 0 to 100%).Each value corresponds to a color.For example: 0 is red, 60 is yellow, and 120 is green. Saturation.It is represented as the distance from the axis black-white.The possible values range from 0 to 100%.This parameter is also often called "purity" by the analogy with the purity excitement and the colorimetrical purity.The lower the color saturation, the more and more grayish hue will be discolored.Therefore, it is useful to define the unsaturation as a qualitative reverse saturation.


Value of color, the brightness of color.It represents the height from the white-black axis.The possible values range from 0 to 100%.0 is always black.Depending on the saturation, 100 could be white or a more or less saturated color.

Image segmentation
The segmentation starts with the analysis of all the segment of skin into a digital image that is normalized into the HSV model and it obtains the approximate range in which the pixels are colored corresponding to human skin: (Insert Figure 4  The sample values are plotted and analyzed to create the sample vector. (Insert Figure 5 here) Figure 5 contains the graph with the values representing the pixels found by analysis in the Figure 4 within the S and H components.The area marked with a rectangle is the one containing the largest number of pixels recognized as skin, limiting the range in a vector of ((0.02, 0.07), (0.2, 0.7)).The segmentation of pixels is carried out by separating, from the original image, the pixels within the range, obtaining as a result, the segmented image containing the pixels recognized as skin.
(Insert Figure 6 here) Figure 6, shows the HSV segmentation process, where the segmented image is obtained by separating skin pixels in the original image.The segmented image is a binary image.

Approach of RSOR Algorithm
The RSOR algorithm performs the recognition and selection of the largest region in a segmented image, evaluates the information obtained employing operations to calculate the percentages and it decides whether the original image contents a nudity or not.This process consists of three main parts:  Region recognition.

 Region selection.
 Operations for the selected region.(Insert Figure 7 here) Figure 7 shows that the RSOR algorithm selects the largest region in the segmented image and makes operations to obtain partial results.

Recognition and selection of largest region.
In the field of computer vision, region recognition concerns to several techniques aimed to detect points or regions either lighter or darker in digital images.It is especially useful to separate objects in a binary image with a fast, interactive method, which uses two functions defined in MATLAB to speed up this process:  Bwlabel to label the regions.Max_Area value is used to obtain the percentage corresponding to the largest region with respect to the segmented image using the following formula: P_LargestRegion is the percentage that corresponds to the largest region with respect to the segmented image and is evaluated on the following criteria: Print <<"This is a nude ' Else If (P_LargestRegion <35%) Print <<"It's not a nude '

End If
End If Now, this criterion gets two indicators, the percentage of skin within the segmented region, and the percentage of skin within the largest region that will be used in order to decide whether the original image is a nudity.The second indicator has more weight to generate the final result.

Experimental results
Now, the performance of the HSV segmentation and the RSOR algorithm are presented.Performance is the efficiency of the algorithm for nudity detection in digital images.The performance is obtained by analyzing several images obtaining the percentage of false positive detections.The lower the percentage, the algorithm has the better performance for nudity detection in digital images.The RSOR algorithm functionality tests are displayed.Functionality is the sequence of steps that the algorithm follows in image analysis to obtain the final result.

Performance tests
The performance is obtained by analyzing 100 digital images, 80 of which are images of nudity, and 20 are non-nudity ones, showing the results of HSV segmentation performance on Table 1, where it can be observed that HSV segmentation, obtains a 19% false-positive.

(Insert Table 1 here)
Using the same set of images, utilized for analyzing the performance of HSV segmentation, performance tests are carried out to the RSOR algorithm, in order to compare the results with those obtained from the segmentation.

(Insert Table 2 here)
Table 2 shows the comparison between the HSV segmentation performance and the RSOR algorithm performance, and it shows that the RSOR algorithm diminishes by 17% the obtainment of false positives and it provides a better approximation to the final result.

Functionality Testing
Table 3, shows the results of the RSOR algorithm in a group of different images.
(Insert Table 3 here) The first image clearly depicts a nude.The second image is a car, not a nude.The third image is used to show nudity detection on various skin tones.The fourth image illustrates an athlete.The fifth image indicates in what manner, the largest region has more influence on acquiring the final result.The sixth image depicts a person in a bathing suit.The seventh image clearly depicts a nude, and the eighth image displays a common error; finding objects with colors similar to those of the skin, and getting a false result.

Comparison of results
A set of 254 digital images out of which 234 are nude images representing 92.12% of the total are utilized.The RSOR algorithm gets a 4.7% false positives with 11 false positives, while on 20 non-nude images, representing the remaining 7.88 %, a 10% false positive is obtained with 2 false positives.
The false-positive results obtained from non-nude images are not considered in the performance of the algorithm, since the objective is to identify nude people in digital images.The false positive results that were found on images of nudity are due to characteristics resembling skin color, just as on the eighth image in Table 3, where you can perceive a person wearing a color similar to skin, which generates a false positive result.The comparative table with the RSOR algorithm and other systems is presented below (Ap-apid, 2005).

(Insert Table 4 here)
Table 4 shows that the RSOR algorithm gets a low rate of false positives compared to other existing systems, and it is the best option for nudity detection in digital images.

Conclusions and future work
The research's main objective is the development of RSOR algorithm as an application for nudity detection in digital images.The algorithm utilizes a combination of techniques to build some systems that permit tracking and detecting pornography on the internet.The algorithm utilizes techniques for treating images in order to isolate the pixels of human skin color, and it is used to decide whether or not the image is a nude.The algorithm shows a better performance for nudity detection in digital images and it makes the following contributions: 1) The analysis process in digital image is simplified by using functions defined in MATLAB.
2) Pixel segmentation techniques are combined between the HSV segmentation and the RSOR algorithm, in the recognition and selection for obtaining the largest region as the source of information for nudity detection.
3) Better results in the calculations of percentages of RSOR algorithm with percentages obtained for HSV segmentation.
4) Obtain a rate low percentage of false positive in the tests.
Future work proposes to increase the number of images to be used on performance tests in order to optimize the RSOR algorithm by reducing the percentage of false positives now obtained.A new proposal will be to carry out the image analysis in such a way as to identify whether the subject in question is an adult or a child, as the objective of this research is to build a system to prevent traffic of child pornography on the internet, a problem that is getting bigger and bigger being the internet the main distribution source.The proposed RSOR algorithm is the first step in building this system.
PCWorld.com.mx,"Crean sistema para combatir la pornografía infantil en línea", [Online] Active: http://www.pcworld.com.mx/Articulos/7950.htm# Figure 3 contains the system of HSV coordinates in the way of an hexacone.The base color is black with HSV values = (0, 0, 0).Most of the color photographs are based on the RGB model.Given a color defined by RGB, where R, G and B are normalized from 0.0 to 1.0, the equivalent HSV color is determined by the equation 1.

Figure 4 ,
Figure4, displays the digital image containing different segments of skin that is analyzed considering the following points: Obtain sample values.Plot sample values.To obtain the sample values apply the following algorithm:

Equation 2 .
Calculation of total skin pixels in the segmented image Where ImageS(x, y) is the n x m matrix that contains the segmented image.Once obtained the number of skin pixels into the segmented image, the RSOR algorithm calculates the percentage of the number of skin pixels in the segmented image concerning the size of the original image.The image has the values of the following variables: Size = # pixels in original image Skin = # pixels in segmented image Percentage (Skin * 100) / Size Percentage is the value of the percentage of skin pixels on the original image used to decide whether a nudity exists on the original image, following the next evaluation points:

Figure 9
Figure 9 displays the plot of performance testing for HSV segmentation and the RSOR algorithm.The plot shows average false positive in five tests to 20, 50 and 100 images.

Table 1 .
The HSV segmentation performance

Table 2 .
Performance comparison between the HSV segmentation and the RSOR algorithm

Table 3 .
Tests on a diverse set of images

Table 4 .
Comparison of results with existing algorithms or systems