Scalable image coding based on epitomes

Abstract

In this paper, we propose a novel scheme for scalable image coding based on the concept of epitome. An epitome can be seen as a factorized representation of an image. Focusing on spatial scalability, the enhancement layer of the proposed scheme contains only the epitome of the input image. The pixels of the enhancement layer not contained in the epitome are then restored using two approaches inspired from local learning-based super-resolution methods. In the first method, a locally linear embedding model is learned on base layer patches and then applied to the corresponding epitome patches to reconstruct the enhancement layer. The second approach learns linear mappings between pairs of co-located base layer and epitome patches. Experiments have shown that the significant improvement of the rate-distortion performances can be achieved compared with the Scalable extension of HEVC (SHVC).

Publication
IEEE Transaction on Image Processing
Date

Context and Goal

The concept of epitome was first introduced by Jojic et al. as the condensed representation (meaning its size is only a fraction of the original size) of an image containing the essence of its the textural properties. This original epitomic model is based on a patch-based probabilistic approach, and has different applications in segmentation, denoising, recognition, indexing or texture synthesis.
Several epitomic models have been since proposed, such as the factorized representation of Wang et al. dedicated to texture mapping, or its extension designed for image coding purposes by Chérigui et al. The epitome is in this case the union of epitome charts which are pieces of repeatable textures found in the image. The search for self-similar or repeatable texture patterns, based on the KLT or a block matching (BM) algorithm, is known to be memory and time consuming.
In this work, we propose a clustering-based technique to reduce the self-similarities search complexity.

Approach

The main steps of the proposed scheme for scalable image coding are depicted in Fig. 1.

In the proposed scheme, the enhancement layer (EL) consists in an epitome of the input image. Consequently, at the decoder side, the EL patches not contained in the epitome are missing, but the corresponding base layer (BL) patches are known. We thus propose to restore the full enhancement layer by taking advantage of the known representative texture patches available in the EL epitome charts. (More explanations on the epitome generation are available [here]() or in the papers listed at the end of this page.)

The epitomes are encoded with a scalable scheme as an enhancement layer. The blocks not belonging to the epitome are directly copied from the decoded base layer, thus their rate-cost is practically non-existent.

The non-epitome part of the enhancement layer is restored using methods derived from local learning-based super-resolution methods, and can be summarized in the three following steps: K-NN search, learning step, and processing step. These steps are shown in Fig. 2. A first method is proposed relying on Locally Linear Embedding, noted E-LLE. A second technique called E-LLM, based on Local Linear Mapping, is also studied.

Proposed scheme for scalable image coding.

Proposed scheme for scalable image coding.

Main steps of the epitome-based restoration.

Main steps of the epitome-based restoration.

Experimental Results

The experiments are performed on the test images listed in Table I, obtained from the HEVC test sequences. The base layer images are obtained by down sampling the input image with a factor 2x2, using the SHVC down-sampling filter available with the SHM software (ver. 9.0). The BL images are encoded with HEVC, using the HM software (ver. 15.0). We then use the SHM software (ver. 9.0) to encode the corresponding enhancement layers. Both layers are encoded with the following quantization steps: QP = 22, 27, 32, 37.

For each input image, 3 to 4 epitomes of different sizes are genererated, ranging from 30% to 90% of the input image sizes.

Table 1 - Test images
Class Image Size
B BasketballDrive 1920x1080
B Cactus 1920x1080
B Ducks 1920x1080
B Kimono 1920x1080
B ParkScene 1920x1080
B Tennis 1920x1080
B Terrace 1920x1080
C BasketballDrill 832x480
C Keiba 832x480
C Mall 832x480
C PartyScene 832x480
D BasketballPass 416x240
D BlowingBubbles 416x240
D RaceHorses 416x240
D Square 416x240
E City 1280x720

We show in Fig. 3 the Bjontegaard rate gains averaged over all sequences depending on the epitome size. The complete results are given in Table 2. We show in Fig. 4 the RD curve of the City image, which behavior is representative of the set of test images. We first show (left) the RD curve for both E-LLE and E-LLM methods with the biggest epitome size (best RD performances). We then show (right) the RD curve for the E-LLE with different epitome sizes.

Average RD performances of the different restoration methods against SHVC depending on the epitome size.

Average RD performances of the different restoration methods against SHVC depending on the epitome size.

Left: E-LLE and E-LLM methods, epitome size = 91.59% of input image. Right: E-LLE method, with different epitome sizes.

Left: E-LLE and E-LLM methods, epitome size = 91.59% of input image. Right: E-LLE method, with different epitome sizes.

We show in Fig. 5 the running time of the different methods for each image class (i.e. size) depending on the epitome size. On the left, we show the running time of the epitome generation at the encoder side. On the right, we show the running time of the restoration step at the decoder side. Note that the epitome generation algorithm was implemented in C++ while the restoration methods were implemented in Matlab.

Left: Epitome generation running time depending on the epitome size for different image classes. Right: Post-processing running time of the different restoration methods depending on the epitome size for different image classes.

Left: Epitome generation running time depending on the epitome size for different image classes. Right: Post-processing running time of the different restoration methods depending on the epitome size for different image classes.

In the table below are listed the exhaustive Bjontegaard rate gains obtained with the proposed methods against SHVC.

Table 2 - Bjontegaard rate gains SHVC depending on the epitome size.
Image Epitome size (% of input image) BD rate gains (%)
E-LLE E-LLM
BasketballDrive 90.62 -20.07 -19.61
64.10 -15.94 -13.70
49.33 -13.66 -8.89
32.34 -9.70 -0.08
Cactus 79.85 -18.19 -16.46
71.24 -17.67 -15.14
60.66 -16.33 -13.01
48.33 -11.63 -7.42
Ducks 89.63 -19.52 -19.07
77.41 -16.71 -14.21
48.28 2.88 10.28
Kimono 90.13 -21.75 -21.42
75.53 -15.98 -12.63
59.36 -17.37 -15.08
35.34 -15.82 -12.31
ParkScene 86.58 -16.88 -16.45
73.55 -15.10 -13.84
61.99 -10.69 -7.50
47.18 -3.94 2.89
Tennis 64.49 -23.04 -21.91
50.44 -22.03 -19.61
43.12 -19.90 -16.59
32.22 -18.42 -13.13
Terrace 78.46 -13.27 -12.49
66.39 -11.32 -9.57
53.31 -6.81 -3.01
43.50 -0.50 9.03
City 91.59 -10.05 -8.76
82.44 -6.24 -1.59
66.81 3.27 17.75
39.52 28.00 59.96
BasketballDrill 87.05 -6.52 -5.50
59.94 -2.82 1.08
42.63 -1.62 4.44
28.53 3.18 13.08
Keiba 93.59 -6.71 -6.42
81.28 -3.69 -1.99
63.53 3.24 7.75
40.77 16.06 23.52
Mall 92.95 -18.20 -16.76
76.28 -0.50 -2.13
66.15 -4.13 3.04
50.26 6.75 27.54
PartyScene 94.82 -5.44 -4.29
81.12 -1.18 7.20
67.56 8.83 25.96
49.13 26.89 57.15
BasketballPass 77.76 -16.17 -13.15
66.60 -14.07 -7.25
56.41 -0.53 10.48
42.31 5.72 28.21
BlowingBubbles 87.56 -6.33 -3.29
73.33 -2.73 3.65
58.85 3.27 13.95
36.92 16.81 44.03
RaceHorses 91.03 -16.08 -15.67
79.23 -4.49 -3.03
58.14 6.69 20.64
36.67 23.45 63.23
Square 80.77 -6.45 -2.97
71.41 -5.88 1.08
61.09 -2.00 4.75
48.72 9.80 31.68

References