Learning clustering-based linear mappings for quantization noise removal

This paper describes a novel scheme to reduce the quantization noise of compressed videos and improve the overall coding performances. The proposed scheme first consists in clustering noisy patches of the compressed sequence. Then, at the encoder side, linear mappings are learned for each cluster between the noisy patches and the corresponding source patches. The linear mappings are then transmitted to the decoder where they can be applied to perform de-noising. The method has been tested with the HEVC standard, leading to a bitrate saving of up to 9.63%.

Context and Goal

Modern video codecs such as HEVC rely on four main steps, which aim at reducing the redundancies of the signal. First, spatial and temporal redudancies are reduced through prediction tools. The prediction residue is then transformed to further reduce the inner correlations of the signal, and the transformed coefficients are then quantized to remove non-perceptible information. The quantized transformed coefficients are finally entropy coded to remove the remaining statistical redundancies. Because the quantization step is not revertible at the decoder side, the reconstructed frames are corrupted with a so-called quantization noise. The goal of this work is to improve the rate-distortion (RD) performances of existing compression schemes using a novel out-of-the-loop de-noising scheme.

Approach

The proposed scheme is represented in Fig. 1, and consists in the following steps:

At the encoder side:
- cluster the coded/decoded patches
- for each cluster, learn a projection function using multivariate linear regression between the coded/decoded patches and the corresponding source patches
- encode the projection function and transmit them to the decoder
At the decoder side:
- decode the projection functions
- cluster the decoded patches, as it is performed at the encoder side
- for each cluster, apply the corresponding projection function to the decoded patches in order to obtain the de-noised patches

Proposed compression scheme using noise removal based on clustering and linear mapping learning.

Alternatively, we propose a two steps scheme represented in Fig. 2, where a blind de-noising algorithm is first performed on the decoded frames, before applying the proposed mehtods.

Alternative two step schemes with first pass blind de-noising.

Selecting the number of clusters

Selecting the number of clusters K is crucial for the overall RD performances, as increasing K improves the de-noising performances, but also the bit-rate associated to the projection functions. We thus propose an adative method to select K, based on a rate-distortion optimization (RDO) criterion, which proceeds as follow:

At initialization, the full data set is considered as one cluster
Each cluster is then recursively split in 2 clusters if the RDO criterion (explained below) is satisfied, creating a binary tree structure
The procedure stops when no RDO criterion is satisfied, or a certain maximum tree depth is reached

To perform the exact same partitioning at the decoder, we then need to transmit the binary tree structure.
The aforementioned RDO criterion can be computed for each cluster, first by estimating the distortion between the de-noised patches of the cluster and the corresponding source patches. The rate is then esitmated as the number of bits of the encoded projection function used to de-noise the patches. Thus, to decide if a cluster $C_{n}$ is further divided into two clusters $C_{n+1}$ and $C_{n+2}$, we merely check the following condition on the associated rate distortion costs: $J_{n+1} + J_{n+2} < J_n$. An example of binary tree structure obtained after recursive paritioning is given in Fig. 3.

Example of a cluster partition based on a binary tree structure and an RDO criterion. The clusters are recursively split to reach 4 clusters. The tree structure is described by only 5 bits.

Oracle clustering

We propose a so-called oracle clustering, which goal is to maximize the de-noising performances of the proposed scheme for a fixed number of clusters $K$. Formally, the oracle clustering solve the following problem: $$\min_C \sum_{i=1 \dots K} \sum_{x_d,x_s \in C_i} || x_s - \textbf{P}_i x_d || ^2 $$ where $x_s$ and $x_d$ are corresponding source and decoded patches respectively, and $\textbf{P}_i$ is the projection function corresponding to the cluster $C_i$.
This oracle clustering is not reproducible at the decoder side, as it relies on the knowledge of the source patches, and thus can not be directly applied in the proposed scheme. However, it allows to define an upper bound on the performances.


First frame of the Kimono source sequence.	First frame of the Kimono sequence encoded with HEVC at QP=37. Decoded PSNR = 37.95dB.

K-means clustering (K=10) performed on non-overlapping 4x4 patches of the Kimono sequence encoded with HEVC at QP=37. Each color correspond to a cluster label. Corresponding de-noised PSNR = 38.07 dB.	Oracle clustering (K=10) performed on non-overlapping 4x4 patches of the Kimono sequence encoded with HEVC at QP=37. Each color correspond to a cluster label. Corresponding de-noised PSNR = 41.26 dB.

Experimental Results

The sequences used in the experiment are presented in Table 1, and mainly consist in HEVC test sequences. The test sequences are processed per Group Of Pictures (GOP), where a GOP contains a number of frames equal to the frame rate (i.e. we have one GOP per second). The sequences are encoded with the HEVC test model HM (ver 15.0) using the Main profile in Random Access, with 4 values for the Quantization Parameter, QP=22, 27, 32, 37.

Table 1 - Test sequences
Sequence	Resolution	Frame count	Frame rate
City	1280x720	600	60
ParkScene	1920x1080	240	24
Tennis	1920x1080	240	24
Kimono	1920x1080	240	24
Cactus	1920x1080	500	50
Terrace	1920x1080	600	60
Basket	1920x1080	500	50
Ducks	1920x1080	500	50
PeopleOnStreet	2560x1600	150	30
Traffic	2560x1600	150	30

We show in Table 2 the RD performances of our method (without first pass de-noising, see Fig. 1) for each GOP of the test sequences, performed with fixed number of clusters K=10.

Table 2 - RD performances per GOP of K-means clustering with a fixed number of clusters K=10 (Bjontegaard bit-rate gains with respect to HEVC)
GOP	City	Park Scene	Tennis	Kimono	Cactus	Terrace	Basket	Ducks	People On Street	Traffic	Average
1	-2.31	0.12	-0.06	-0.13	-1.16	-5.38	-0.83	-2.32	-2.54	-1.29	-1.59
2	-1.76	0.15	0.09	-0.13	-1.22	-7.14	-0.43	-1.81	-2.55	-1.23	-1.60
3	-1.96	0.09	0.18	-0.20	-0.88	-8.62	-0.87	-1.02	-2.54	-1.22	-1.70
4	-1.74	0.18	0.10	-0.24	-0.81	-11.69	-0.74	-1.16	-2.60	-1.18	-1.99
5	-0.96	0.30	0.10	-0.19	-1.05	-12.15	-0.65	-1.20	-2.58	-1.31	-1.97
6	-1.90	0.47	1.60	0.35	-1.20	-10.05	-0.43	-1.61	-1.60
7	-1.59	0.48	1.45	1.22	-1.11	-8.49	-0.51	-2.01	-1.32
8	-1.98	0.50	1.05	1.42	-1.34	-7.53	-0.71	-2.36	-1.37
9	-1.84	0.59	0.84	1.90	-1.03	-6.73	-0.74	-2.49	-1.19
10	-1.92	0.97	0.77	2.54	-0.91	-6.79	-0.49	-2.76	-1.07
Average	-1.81	0.38	0.08	0.43	-1.08	-8.28	-0.65	-1.88	-2.56	-1.26	-1.66

We show in Table 3 the RD performances of our method (without first pass de-noising, see Fig. 1) for each GOP of the test sequences, performed with the adaptive selection of the number of clusters (Kmax=16). In addition, we give in Table 4 the selected K values depending on the quantization parameter and averaged over all GOPs. We can see that the K values adapt to the sequence, reaching higher values for sequences with a higher resolution and/or frame rate. The K values also vary depending on QP, and lower values are selected at low bit-rate (high QP value).

Table 3 - RD performances per GOP of K-means clustering with the adaptive K selection (Kmax=16) (Bjontegaard bit-rate gains with respect to HEVC)
GOP	City	Park Scene	Tennis	Kimono	Cactus	Terrace	Basket	Ducks	People On Street	Traffic	Average
1	-2.76	-0.69	-0.90	-1.32	-1.12	-5.25	-1.10	-2.39	-2.57	-1.44	-1.96
2	-2.31	-0.67	-0.72	-1.35	-1.29	-6.83	-0.40	-1.82	-2.58	-1.39	-1.93
3	-2.45	-0.63	-0.78	-1.36	-0.89	-8.86	-0.98	-1.08	-2.59	-1.35	-2.10
4	-1.66	-0.54	-0.82	-1.31	-0.98	-11.58	-0.62	-1.24	-2.61	-1.35	-2.27
5	-1.84	-0.52	-0.72	-0.89	-1.11	-11.95	-0.76	-1.27	-2.63	-1.53	-2.32
6	-2.22	-0.48	-1.08	-0.85	-1.37	-9.90	-0.77	-1.71	-2.30
7	-1.90	-0.48	-1.91	-0.58	-1.25	-8.20	-0.68	-2.13	-2.14
8	-1.86	-0.40	-2.39	-0.58	-1.30	-7.23	-1.09	-2.47	-2.16
9	-2.51	-0.31	-1.77	-0.43	-1.19	-6.35	-0.88	-2.58	-2.00
10	-2.74	0.02	-2.52	0.29	-1.00	-6.47	-0.69	-2.84	-1.99
Average	-2.24	-0.47	-1.41	-0.94	-1.15	-8.08	-0.80	-1.96	-2.60	-1.42	-2.11

Table 4 - Selected K values for the RDO adaptive selection of the number of clusters (Averaged over all GOPs)
QP	City	Park Scene	Tennis	Kimono	Cactus	Terrace	Basket	Ducks	People On Street	Traffic	Average
22	10.8	8.5	7.6	8.2	14.9	15.4	12.0	15.9	16.0	14.0	12.3
27	11.3	6.8	5.7	6.4	11.7	15.0	9.7	15.8	16.0	12.6	11.1
32	7.4	4.0	3.4	4.5	9.0	14.1	7.8	15.8	16.0	7.6	9.0
37	5.9	4.0	2.8	4.0	7.1	13.1	5.8	15.5	15.8	5.8	8.0
Average	8.9	5.8	4.9	5.8	10.7	14.4	8.8	15.8	16.0	10.0	10.1

We show in Table 5 the RD performances of the proposed two steps scheme (see Fig. 2), with a first pass VBM3D de-noising, for the first GOP of the test sequences, performed with the adaptive selection of the number of clusters (Kmax=16). In addition, we give in Table 6 the selected K values depending on the quantization parameter.

Table 5 - RD performances of the two steps scheme with the adaptive K selection (Kmax=16) for the first GOP (Bjontegaard bit-rate gains with respect to HEVC)
Sequence	BD-rate
City	-3.24
Park Scene	-1.07
Tennis	-7.77
Kimono	-4.32
Cactus	-6.19
Terrace	-7.24
Basket	-3.80
Ducks	-4.13
People On Street	-9.63
Traffic	-5.43
Average	-5.28

Table 6 - Selected K values for the two steps scheme with the RDO adaptive selection of the number of clusters (Averaged over all GOPs)
QP	City	Park Scene	Tennis	Kimono	Cactus	Terrace	Basket	Ducks	People On Street	Traffic	Average
22	15.0	12.0	9.0	7.0	16.0	16.0	11.0	16.0	16.0	15.0	13.3
27	15.0	8.0	7.0	6.0	13.0	16.0	11.0	16.0	16.0	12.0	12.0
32	10.0	6.0	6.0	6.0	12.0	16.0	9.0	16.0	15.0	13.0	10.9
37	9.0	4.0	5.0	4.0	11.0	15.0	5.0	16.0	15.0	6.0	9.0
Average	12.3	7.5	6.8	5.8	13.0	15.8	9.0	16.0	15.5	11.5	11.3

Finally, we show in Table 7 the upper bound on the RD performances obtained with the oracle clustering.

Table 7 - Upper bound on the RD performances per GOP using the oracle clustering with a fixed number of clusters K=10 (Bjontegaard bit-rate gains with respect to HEVC)
GOP	City	Park Scene	Tennis	Kimono	Cactus	Terrace	Basket	Ducks	People On Street	Traffic	Average
1	-28.85	-22.38	-30.48	-40.90	-23.28	-29.93	-32.60	-31.03	-23.11	-25.35	-28.79
2	-28.34	-22.40	-30.39	-42.10	-24.62	-34.49	-29.80	-28.40	-23.15	-25.52	-28.92
3	-28.35	-21.86	-30.61	-41.10	-24.89	-41.78	-30.62	-18.31	-23.07	-25.31	-28.59
4	-29.20	-21.43	-30.92	-40.77	-26.09	-48.57	-27.25	-16.72	-23.88	-25.68	-29.05
5	-28.89	-21.07	-30.34	-40.85	-25.08	-46.06	-29.78	-16.08	-23.07	-25.65	-28.69
6	-29.43	-21.50	-44.94	-38.14	-24.65	-39.84	-31.07	-18.51	-31.01
7	-29.19	-21.66	-45.89	-25.38	-24.21	-38.70	-29.48	-20.69	-29.40
8	-29.08	-21.71	-46.38	-25.24	-24.71	-37.85	-30.36	-22.64	-29.75
9	-28.89	-21.37	-45.92	-25.17	-24.53	-35.55	-27.57	-23.33	-29.04
10	-29.01	-21.24	-46.14	-16.37	-12.72	-42.21	-21.39	-24.99	-26.76
Average	-28.92	-21.66	-38.20	-33.60	-23.48	-39.50	-28.99	-22.07	-23.26	-25.50	-28.52

We display below a summary of the RD performances of the different versions of the proposed scheme, in the form of RD curves for the first GOP of each sequence.


RD performances for the first GOP of the City sequence.	RD performances for the first GOP of the Park Scene sequence.

RD performances for the first GOP of the Tennis sequence.	RD performances for the first GOP of the Kimono sequence.

RD performances for the first GOP of the Cactus sequence.	RD performances for the first GOP of the Terrace sequence.

RD performances for the first GOP of the Basket sequence.	RD performances for the first GOP of the Ducks sequence.

RD performances for the first GOP of the People On Street sequence.	RD performances for the first GOP of the Traffic sequence.