Real-time Automated Detection and Recognition of Nigerian License Plates via Deep Learning Single Shot Detection and Optical Character Recognition

License plate detection and recognition are critical components of the development of a connected Intelligent transportation system but are underused in developing countries because of the associated costs. Existing license plate detection and recognition systems with high accuracy require the usage of Graphical Processing Units (GPU), which may be difficult to come by in developing nations. Single-stage detectors and commercial optical character recognition engines, on the other hand, are less computationally expensive and can achieve acceptable detection and recognition accuracy without the use of a GPU. In this work, a pre-trained SSD model and a tesseract tessdata-fast traineddata were fine-tuned on a dataset of more than 2,000 images of vehicles with license plates. These models were combined with a unique image preprocessing algorithm for character segmentation and tested using a general-purpose personal computer on a new collection of 200 automobiles with license plate photos. On this testing set, the plate detection system achieved a detection accuracy of 99.5 % at an IOU threshold of 0.45 while the OCR engine successfully recognized all characters on 150 license plates, one character incorrectly on 24 license plates and two or more incorrect characters on 26 license plates. The detection procedure took an average of 80 milliseconds, while the character segmentation and identification stages took an average of 95 milliseconds, resulting in an average processing time of 175 milliseconds per image, or 6 photos per second. The obtained results are suitable for real-time traffic applications.


Introduction
The detection and recognition of license plates are critical parts of traffic monitoring and are essential to the development of a connected Intelligent transportation system. Automatic vehicle License Plate Detection and Recognition (ALPR) engines have been extensively explored and have attained state-of-the-art results with the recent use of Graphical Processing Units (GPU) in deep learning computation applications (Hendry & Chen, 2019). However, these GPU systems are expensive and may not be widely available in developing nations. On the other hand, the recent increase in ICT availability in developing countries has made basic ICT tools such as mobile phones and general-purpose computers available in these areas (Avgerou et al., 2016). Thus, deploying ALPR systems on easily accessed general-purpose PCs will facilitate ALPR usage in developing countries.
The process of ALPR majorly consists of three important stages that are the license plate detection stage, character segmentation, and character recognition stages (Agbemenu, Andrew et al., 2018). All stages of ALPR are important. Nonetheless, a failure in the plate detection stage causes the entire ALPR process to fail. Deep learning-based plate detection systems, on the other hand, have the lowest false detection rates (Wang et al., 2021). However, deep learning-based license plate detection and recognition systems that achieve exceptional accuracy typically use two-stage deep learning techniques such as R-CNN and FR-CNN for ALPR, which are too heavy to operate in real-time on simple non-GPU devices (Zebin et al., 2019). On the other hand, one-stage detectors such as YOLO and SSD, which are faster and can work in real-time, provide systems that are fast and can operate in real-time but produce multiple errors due to their poor performance in identifying small objects such as plate number characters (E. Dong et al., 2018). Character segmentation and character recognition using deep learning techniques have also performed excellently in literature (Shivakumara et al., 2018) but are as well computationally expensive. Optical character recognition engines are intended for large-scale text recognition with decent accuracy, and they are optimized to run quickly on both GPU and non-GPU platforms.
This study aims to develop a system for detecting and recognizing Nigerian license plates in real-time using non-GPU general-purpose computers. The above objective will be achieved by integrating a computationally efficient fine-tuned single shot detector object detection pipeline and optical character recognition. The system will also be able to provide additional information about the class of the license plate and the issuer's information.
The rest of this paper is structured as follows. Section 2 gives a literature review of the various ALPR stages. Section 3 explains the proposed system and its implementation. Section 4 contains the results and discussion of validating the developed system on a test dataset. Section 5 concludes the paper.

Literature Review
ALPR is a critical component of ITS security and traffic control systems that have received extensive attention from researchers. Images of stationary or moving vehicles with valid license plate instances are the primary input to ALPR systems. ALPR consists of three major steps, which are implemented in the literature using various processing methods. The first steps of ALPR are license plate detection and localization. Edge detection and feature-based detection algorithms are two extensively utilized license plate detection and localization techniques in literature. Edge detection (Agbemenu, Andrew et al., 2018;Dalarmelina et al., 2020;Ibiyemi et al., 2020;Oluchi et al., 2019) yields high inferencing speed but also produces several incorrect candidates since the system frequently detects undesirable edges (Hendry & Chen, 2019). Feature-based detection techniques make use of license plate-specific features such as color and texture (Deb & Jo, 2008;Wu et al., 2013), or a mix of features. Deep learning plate detectors that use several license plate features to localize them mostly employ machine learning models and techniques such as suppress vector machines (SVM) (Oluchi et al., 2019) and convolution neural networks (CNN) (M. Dong et al., 2017;Shivakumara et al., 2018). The capacity of deep learning-based detectors to use several features of a license plate in localizing it enhances plate detection accuracy and generalization at the expense of greater processing cost.
Character segmentation in ALPR entails the process of denoising the detected plate image and separating characters into individual units that can be properly recognized. Commonly used character segmentation techniques are; characters erosion and dilation with edge and contour detection (Attah et al., 2016;Dalarmelina et al., 2020), character aspect ratio sampling (Agbemenu, Andrew et al., 2018;Ibiyemi et al., 2020), and the use of multiple sliding concentric window character region of Interest (ROI) pooling (Anagnostopoulos et al., 2006). CNN-based ALPR systems (Hendry & Chen, 2019;Shivakumara et al., 2018) often replace the character segmentation stage with a localization process carried out by the deep learning model. In this process, individual plate characters are localized feature-wise and then automatically segmented before they are passed up for recognition. Character segmentation is important for proper character recognition. If characters are not well segmented, the accuracy of the character recognition stage will be affected.
Character recognition (CR) is frequently the final stage in most ALPR systems. In the work of (Ibiyemi et al., 2020), CR was achieved using template matching. This method of CS has little capacity for generalization and is computationally expensive. In the studies of (M. Dong et al., 2017;Hendry & Chen, 2019), CR was achieved using multiple lightweight CNN classifier models. These models were trained to detect each character of the 35 license plate characters (A-Z and 0-9). They achieved great accuracy at the expense of speed. CR performed using OCR engines such as tesseract (RNN based LSTM engine maintained by Google) (Agbemenu, Andrew et al., 2018;Dalarmelina et al., 2020) are very fast but typically achieves full license plate recognition accuracies between 60 and 70%. These recognition accuracy values can be attributed to the character segmentation methods employed in the studies which do not completely conform to the black texts on with background requirements of the OCR engine (tesseract-ocr, 2021b).
In this study, we combined a light SSD mobilenetV2 CNN license plate object detector with a fast OCR engine to obtain higher plate detection and recognition accuracy at a real-time inferencing speed. We have also focused heavily on the character segmentation stage to supply our OCR engine with text blobs that meet the engine's requirements and so improve character recognition accuracy. The license plate of emphasis in this study is the Nigerian license plate system, which has been fully discussed in the work of (Ibiyemi et al., 2020).

Materials and Methods
In this section, we introduced the proposed real-time ALPR system and its constituents.

Proposed System
The proposed system is designed to acquire stationary images of vehicles from any input source. The images used in this study were acquired using an 8-megapixel smartphone camera with each picture having an average size of 3 MB under various lighting conditions. 700 images of vehicles with Nigerian license plates (both old and new) were captured and digitally augmented by rotation and skewing to provide a dataset of 2100 images to be used in training the plate detection and character recognition algorithms. 200 images were also captured and preserved for validating the ALPR system. To correctly detect and recognize a license plate, the system proceeds through the algorithm below;

End if
The software implementation of the above algorithm is achieved using the python programing language.

License Plate Detection
In this stage, a high-quality image is downsampled and sent into a deep learning object detection pipeline. The pipeline returns box coordinates, which reflect the recognized license plate's bounding boxes. The pipeline extrapolates the supplied box coordinates to their original places on the high-quality input image. The license plate ROI is then retrieved from the specified bounding box coordinates and passed to the next stage of the ALPR. A MobileNetV2-based (Sandler et al., 2018) single shot detector pipeline from the TensorFlow Object Detection team was the model of choice in this study. This model was chosen for its flexibility, fast inferencing speed without the use of GPU, and the capacity for model optimization which reduces the size of its weights and biases and even further increases the inferencing speed. It has an input feature size of 320 by 320 by 3 and is maintained by (Tf2_detection_zoo.Md, 2021). Input images are first resized from their respective input image sizes to 320 by 320 pixels via the OpenCV resize function and then passed to the pipeline for plate detection. The accuracy of the plate detection pipeline on the validation set is calculated using Equation 1.

Plate Number Segmentation
In this stage, an extracted plate ROI is morphologically processed using the OpenCV open-source image http://cis.ccsenet.org Computer and Information Science Vol. 14, No. 4;2021 14 processing library (Opencv-python, 2021). First, the extracted plate is separated into its constituent channels and the mean of each channel is computed. This is to enable the system to determine the type of plate number being processed from the color information (Attah et al., 2016). Next, the input image is grayscale and then Otsu or adaptively threshold based on the average number of edges present in the input image. Next, the image is further cropped to remove other irrelevant information at its top and bottom by comparing the plate's length and width to that of a standard Nigerian license plate. Finally, the extracted license plate strip is passed to the OCR engine for character recognition. Figure 1 to 5 shows the various morphological preprocessing operations in the segmentation step.

Optical Character Recognition
The plate strip from the character segmentation stage is passed to the OCR engine for character detection and recognition in this stage. A simple plate postprocessing approach is applied to the raw recognized raw text. First, whitespaces and hyphens are removed from the raw text by this technique. Following that, the algorithm pattern matches the raw text based on the observed license plate class. This is because government plate numbers frequently contain mixed character agency or state distinguishing codes, whereas private and commercial license plates use exclusively alphabetic local government codes. Once a valid pattern is found in the raw text, the algorithm extracts this matched text and uses its LGA or agency code to look up its designated state or issuer information. Finally, the algorithm outputs the issuer and plate number of the provided license plate strip.
The tesseract engine is a general-purpose Long short-term memory OCR engine. It provides an avenue for using custom trained weights and activations bundled into a traineddata file type. Our custom traineddata was a finetuned base tesseract eng-fast.traineddata file (tesseract-ocr, 2021b). 700 plate images were extracted manually from our training dataset and were preprocessed into license plate strips samples of which are shown in Figure 6. These strips were then converted to the '.tiff' image format and then passed to the tesseract trainer 'box-to-text' script which localizes the various characters in the license plate image. Wrongly localized characters were manually corrected by adjusting the character information provided in the associated automatically generated '.txt' file. All images and their ground truth files are placed into a training file used to fine-tune the base trained data. Figure 6. Sample old license plate strips for OCR engine training

Results and Discussion
The plate detection pipeline was trained for 50,000 steps using the TensorFlow framework on the Google Collaboratory platform after which it obtained a final training loss of 0.095. The normalized training loss graph for the plate detection pipeline is shown in Figure 7. To improve its performance, the trained model was optimized to the TensorFlow Lite version. Instead of the standard command-line package, the OCR engine was implemented utilizing a python API package which increased the OCR engine's character recognition speed.  On 150 images in the collection, the OCR engine accurately recognized all plate number information, including the designated state or agency code. On 24 license plate photos, the engine incorrectly recognized one character, while the engine incorrectly recognized two or more characters on 26 license plate images. This equates to a license plate recognition accuracy of 75% and a single character recognition accuracy of 92%. The OCR engine had a processing time of 90 milliseconds on average. In addition, the system properly categorized all license plates in the dataset. The system takes an average of 175 milliseconds to process a license plate. This is a reasonable value for real-time detection in low-speed traffic. Table 1 provides information on the result of running the developed system on 8 different plate strips from the validation set.

Conclusion
Using a lightweight deep learning SSD object detection pipeline, an improved character segmentation algorithm, and the tesseract OCR engine, this study has successfully developed a system that detected and recognized the plate numbers of Nigerian license plates in real-time and without the use of GPU on general-purpose PCs.
The model used in this study detected both old and new Nigerian license plates of various sizes in the validation dataset and thus, demonstrates the robustness of SSD detection pipelines in fast and accurate inferencing of object instances that are not too small compared to the image size, as well as their generalization potential. The system was able to categorized license plates as private, commercial, or government, and it also provides the plate's issuance information. This advanced function can be used at traffic gates for precise access control.
While the retrained OCR engine was able to differentiate identical character pairs such as 'A' as '4', 'G' and '6', and '2' and 'Z', it had difficulty distinguishing '0' and 'D' and 'B' and '8' on fading or scratched license plates.
Using an end-to-end multiclass SSD detection pipeline for character identification may improve accuracy because CNN classifiers are resistant to character rotation, breakage, or skews. To sustain the achieved rapid inferencing, a general-purpose PC with a better CPU will be required.