An Efficient Two-Level Dictionary-Based Technique for Segmentation and Compression Compound Images

  •  Nidhal Kamel Taha El-Omari    


Image data compression algorithms are essential for getting storage space reduction and, perhaps more importantly, to increase their transfer rates, in terms of space-time complexity. Considering that there isn't any encoder that gives good results across all image types and contents, this paper proposed an evolvable lossless statistical block-based technique for segmentation and compression compound or mixed documents that have different content types, such as pictures, graphics, and/or texts.

Derived from the number of detected colors and to achieve better compression ratios, a new well-defined representation of the image is created which nonetheless retains the same image components. With the effort of reducing noise or other variations inside the scanned image, some primary operations are implemented. Thereafter, the proposed algorithm breaks down the compound document image into equal-size-square blocks. Next, inspired by the number of colors detected in each block, these blocks are categorized into a set of six-image objects, called classes, where each one contains a set of closely interrelated pixels that share the same common relevant attributes like color gamut and number, color occurrence, grey level, and others. After that, a new representation of these coherent classes is formed using the Lookup Dictionary Table (LUD), which is the real essence of this proposed algorithm. In order to form distinguishable labeled regions sharing the same attributes, adjacent blocks of similar color features are consolidated together into a single coherent whole entity, called segments or regions. After each region is encoded by one of the most off-the-shelf applicable compression techniques, these regions are eventually fused together into a single data file which then subjects to another compression stage to ensure better compression ratios. After the proposed algorithm has been applied and tested on a database containing 3151 24-bit-RGB-bitmap document images, the empirically-based results prove that the overall algorithm is efficient in the long run and has superior storage space reduction when compared with other existing algorithms. As for the empirical findings, the proposed algorithm has achieved (71.039 %) relative reduction in the data storage space.

This work is licensed under a Creative Commons Attribution 4.0 License.