Scalable image coding based on epitomes

In this paper, we propose a novel scheme for scalable image coding based on the concept of epitome. An epitome can be seen as a factorized representation of an image. Focusing on spatial scalability, the enhancement layer of the proposed scheme contains only the epitome of the input image. The pixels of the enhancement layer not contained in the epitome are then restored using two approaches inspired from local learning-based super-resolution methods. In the first method, a locally linear embedding model is learned on base layer patches and then applied to the corresponding epitome patches to reconstruct the enhancement layer. The second approach learns linear mappings between pairs of co-located base layer and epitome patches. Experiments have shown that the significant improvement of the rate-distortion performances can be achieved compared with the Scalable extension of HEVC (SHVC).

Context and Goal

The concept of epitome was first introduced by Jojic et al. as the condensed representation (meaning its size is only a fraction of the original size) of an image containing the essence of its the textural properties. This original epitomic model is based on a patch-based probabilistic approach, and has different applications in segmentation, denoising, recognition, indexing or texture synthesis.
Several epitomic models have been since proposed, such as the factorized representation of Wang et al. dedicated to texture mapping, or its extension designed for image coding purposes by Chérigui et al. The epitome is in this case the union of epitome charts which are pieces of repeatable textures found in the image. The search for self-similar or repeatable texture patterns, based on the KLT or a block matching (BM) algorithm, is known to be memory and time consuming.
In this work, we propose a clustering-based technique to reduce the self-similarities search complexity.

Approach

The main steps of the proposed scheme for scalable image coding are depicted in Fig. 1.

In the proposed scheme, the enhancement layer (EL) consists in an epitome of the input image. Consequently, at the decoder side, the EL patches not contained in the epitome are missing, but the corresponding base layer (BL) patches are known. We thus propose to restore the full enhancement layer by taking advantage of the known representative texture patches available in the EL epitome charts. (More explanations on the epitome generation are available [here]() or in the papers listed at the end of this page.)

The epitomes are encoded with a scalable scheme as an enhancement layer. The blocks not belonging to the epitome are directly copied from the decoded base layer, thus their rate-cost is practically non-existent.

The non-epitome part of the enhancement layer is restored using methods derived from local learning-based super-resolution methods, and can be summarized in the three following steps: K-NN search, learning step, and processing step. These steps are shown in Fig. 2. A first method is proposed relying on Locally Linear Embedding, noted E-LLE. A second technique called E-LLM, based on Local Linear Mapping, is also studied.

Proposed scheme for scalable image coding.

Main steps of the epitome-based restoration.

Experimental Results

The experiments are performed on the test images listed in Table I, obtained from the HEVC test sequences. The base layer images are obtained by down sampling the input image with a factor 2x2, using the SHVC down-sampling filter available with the SHM software (ver. 9.0). The BL images are encoded with HEVC, using the HM software (ver. 15.0). We then use the SHM software (ver. 9.0) to encode the corresponding enhancement layers. Both layers are encoded with the following quantization steps: QP = 22, 27, 32, 37.

For each input image, 3 to 4 epitomes of different sizes are genererated, ranging from 30% to 90% of the input image sizes.

Table 1 - Test images
Class	Image	Size
B	BasketballDrive	1920x1080
B	Cactus	1920x1080
B	Ducks	1920x1080
B	Kimono	1920x1080
B	ParkScene	1920x1080
B	Tennis	1920x1080
B	Terrace	1920x1080
C	BasketballDrill	832x480
C	Keiba	832x480
C	Mall	832x480
C	PartyScene	832x480
D	BasketballPass	416x240
D	BlowingBubbles	416x240
D	RaceHorses	416x240
D	Square	416x240
E	City	1280x720

We show in Fig. 3 the Bjontegaard rate gains averaged over all sequences depending on the epitome size. The complete results are given in Table 2. We show in Fig. 4 the RD curve of the City image, which behavior is representative of the set of test images. We first show (left) the RD curve for both E-LLE and E-LLM methods with the biggest epitome size (best RD performances). We then show (right) the RD curve for the E-LLE with different epitome sizes.

Average RD performances of the different restoration methods against SHVC depending on the epitome size.

Left: E-LLE and E-LLM methods, epitome size = 91.59% of input image. Right: E-LLE method, with different epitome sizes.

We show in Fig. 5 the running time of the different methods for each image class (i.e. size) depending on the epitome size. On the left, we show the running time of the epitome generation at the encoder side. On the right, we show the running time of the restoration step at the decoder side. Note that the epitome generation algorithm was implemented in C++ while the restoration methods were implemented in Matlab.

Left: Epitome generation running time depending on the epitome size for different image classes. Right: Post-processing running time of the different restoration methods depending on the epitome size for different image classes.

In the table below are listed the exhaustive Bjontegaard rate gains obtained with the proposed methods against SHVC.

Table 2 - Bjontegaard rate gains SHVC depending on the epitome size.
Image	Epitome size (% of input image)	BD rate gains (%)
Image	Epitome size (% of input image)	E-LLE	E-LLM
BasketballDrive	90.62	-20.07	-19.61
	64.10	-15.94	-13.70
	49.33	-13.66	-8.89
	32.34	-9.70	-0.08
Cactus	79.85	-18.19	-16.46
	71.24	-17.67	-15.14
	60.66	-16.33	-13.01
	48.33	-11.63	-7.42
Ducks	89.63	-19.52	-19.07
	77.41	-16.71	-14.21
	48.28	2.88	10.28
Kimono	90.13	-21.75	-21.42
	75.53	-15.98	-12.63
	59.36	-17.37	-15.08
	35.34	-15.82	-12.31
ParkScene	86.58	-16.88	-16.45
	73.55	-15.10	-13.84
	61.99	-10.69	-7.50
	47.18	-3.94	2.89
Tennis	64.49	-23.04	-21.91
	50.44	-22.03	-19.61
	43.12	-19.90	-16.59
	32.22	-18.42	-13.13
Terrace	78.46	-13.27	-12.49
	66.39	-11.32	-9.57
	53.31	-6.81	-3.01
	43.50	-0.50	9.03
City	91.59	-10.05	-8.76
	82.44	-6.24	-1.59
	66.81	3.27	17.75
	39.52	28.00	59.96
BasketballDrill	87.05	-6.52	-5.50
	59.94	-2.82	1.08
	42.63	-1.62	4.44
	28.53	3.18	13.08
Keiba	93.59	-6.71	-6.42
	81.28	-3.69	-1.99
	63.53	3.24	7.75
	40.77	16.06	23.52
Mall	92.95	-18.20	-16.76
	76.28	-0.50	-2.13
	66.15	-4.13	3.04
	50.26	6.75	27.54
PartyScene	94.82	-5.44	-4.29
	81.12	-1.18	7.20
	67.56	8.83	25.96
	49.13	26.89	57.15
BasketballPass	77.76	-16.17	-13.15
	66.60	-14.07	-7.25
	56.41	-0.53	10.48
	42.31	5.72	28.21
BlowingBubbles	87.56	-6.33	-3.29
	73.33	-2.73	3.65
	58.85	3.27	13.95
	36.92	16.81	44.03
RaceHorses	91.03	-16.08	-15.67
	79.23	-4.49	-3.03
	58.14	6.69	20.64
	36.67	23.45	63.23
Square	80.77	-6.45	-2.97
	71.41	-5.88	1.08
	61.09	-2.00	4.75
	48.72	9.80	31.68

References

Nebojsa Jojic, Brendan J. Frey, Anitha Kannan, Epitomic analysis of appearance and shape, ICCV 2003. [PDF] [Web]
Huamin Wang, Yonatan Wexler, Eyal Ofek, Hugues Hoppe, Factoring repeated content within and among images, SIGGRAPH 2008. [PDF] [Web]
Safa Cherigui, Christine Guillemot, Dominique Thoreau, Philippe Guilltotel and Patrick Perez, Epitome-based image compression using translational sub-pel mapping, MMSP 2011. [PDF]

Scalable image coding based on epitomes

Abstract

Context and Goal

Approach

Experimental Results

References