Adaptive Longterm Reference Selection for Efficient Scalable Surveillance Video Coding45027

2018 IEEE 12th International Symposium on Embedded Multicore/Many-core Systems-on-Chip Adaptive Long-term Reference Selection for Efficient Scalable Surveillance Video Coding Le Dao Thi Hue, Giap PhamVan, Xiem HoangVan VNU – University of Engineering and Technology hueledao94@gmail.com, giap_pham@outlook.com, xiemhoang@vnu.edu.vn Abstract The recent achievements of video coding technology have resulted in a new video coding solution, namely High Efficiency Video Coding (HEVC) [2] As reported, HEVC significantly outperforms the well-known H.264/AVC standard [3] For adaptive video streaming, the HEVC scalable extension, namely SHVC, has been introduced in 2014 [4] SHVC is mainly designed based on a layered coding structure in which one base layer (BL) is used to compress the video sequence with low and basic quality / resolution fidelities and one or several enhancement layers (EL) is used to provide enhanced quality/ resolution fidelities Though SHVC is the latest scalable video coding standard, its compression performance is still an emerging topic for research and development The work in [5] proposed an improved EL merge mode while the work in [6] proposed a novel joint layer prediction solution As reported [5, 6], the proposed EL merge mode and joint layer prediction significantly improve the SHVC compression performance, far beyond the SHVC standard However, none of these proposals is designed for visual surveillance systems as it is mainly created for the generic video content In a visual surveillance system, cameras are usually placed at a certain position or moved with a very narrow angle Therefore, the surveillance video content usually contains a large area of background as well as having a high temporal correlation between frames To exploit these characteristics, we propose in this paper an improved SHVC compression solution, which is designed for surveillance video content The proposed surveillance scalable video coding (SSVC) is created based on the use of an adaptive long – term reference selection and updating mechanism The video content is carefully analyzed before using for selecting and updating the reference picture Experimental results have shown that the proposed SSVC solution significantly outperforms the relevant SHVC standard, notably with around 5.38 % bitrate saving while still providing a similar perceptual decoded frame quality The rest of this paper is organized as follows Section briefly discusses the related and The exponential growth of video surveillance has been asking for a more powerful video coding solution, which is characterized by not only the high compression efficiency but also the adaptive video streaming capability The surveillance video content, however, usually contains a large number of background areas and having high temporal correlation between frames In this context, we propose a novel adaptive long-term reference mechanism for scalable surveillance video coding, which provides the quality and temporal scalabilities while achieving the high compression performance The proposed long – term reference is mainly selected based on the content analysis of video sequence The long-term reference selection solution is integrated into the most recent Scalable High Efficiency Video Coding (SHVC) standard Experiments conducted for a rich set of surveillance videos show that the proposed scalable video coding solution can achieve around 5.38% bitrate saving when compared to the traditional SHVC video coding benchmark Keywords: Surveillance scalable video coding, SHVC standard, long – term reference, bitrate saving Introduction In recent years, there has been an accelerated expansion of surveillance systems to cope with security and safety’s threats Considerable numbers of surveillance cameras have been mounted in public and private areas The emergence of large video surveillance infrastructures leads to a massive amount of content that must be stored, analyzed and managed by security teams with limited resources Furthermore, the heterogeneity of networks, display devices, and transmission environments has been rising as a critical issue in modern video communication era [1] To fulfill these challenges, it is necessary to have an efficient and adaptable surveillance video compression system, which provides not only the compression efficiency but also the adaptability to the network and transmission variation 978-1-5386-6689-0/18/$31.00 ©2018 IEEE DOI 10.1109/MCSoC2018.2018.00023 69 Proposed surveillance scalable video coding solution background work on surveillance and scalable video coding Afterwards, Section describes the proposed SSVC solution Section presents and analyzes the compression performance of the proposed SSVC with comparison to the SHVC standard Finally, Section gives some main conclusions and ideas for future works To describe the proposed surveillance scalable video coding solution and its motivation, this section starts with a brief analysis of the surveillance video content Afterwards, the proposed compression solution architecture and its novel coding tools are presented Related work 3.1 Observations 2.1 SHVC standard Surveillance video systems have been widely used in modern life, from home security to public environments like schools, factories, or smart transportation In such system, the surveillance cameras are usually set at a fixed position or moved with very narrow angles Therefore, the surveillance videos usually contain the static scene and local movements; thus, a large temporal redundancy can be exploited in such video content To study this fact, we show in Fig two frames obtained from a surveillance video, Intersection obtained from [11] The differences between these frames are computed and illustrated in Fig SHVC is the latest scalable video coding solution, an extension of the well-known HEVC standard [4], providing adaptive video compression capability for a large number of video transmission environments and displaying devices Similar to the prior SVC standard [7], SHVC also follows a layered coding structure with one base layer and one or several enhancement layers In contrasts to the SVC, SHVC adopted the closeloop coding structure at each compression layer and thus, only high-level syntax (HLS) element can be changed to upgrade from HEVC to SHVC solution Following the HLS approach, the inter layer processing module is added to link the base layer with the enhancement layers In this module, the texture and motion information derived from the BL or lower layers will be proceeded to optimally use at the ELs As reported [4], SHVC is able to provide not only the quality, temporal and spatial scalabilities as commonly supported in SVC standard but also introducing the newly bit-depth and color gamut scalability functions It is also worth to note that SHVC is mainly designed for genetic video content Therefore, some specific video contents like surveillance or conference videos may not benefit from its compression structure (a) 2.2 Surveillance video coding (b) (c) b Frame 280th Fig a Frame 1st c Difference between (a) and (b) Surveillance video compression has been attracted many researches due to its wide use in real surveillance and security visual systems In an early work, X.G Zhang et al presented in [8] an efficient coding solution for surveillance videos captured from stationary cameras In this proposal, a high quality background frame is generated and employed to compress the surveillance video frames Considering the importance of the background frames, several background frame models have been presented [9, 10] However, most of these works are developed for the non-scalable video coding structure, i.e., H.264/AVC [3], and HEVC [2] Therefore, the presented surveillance video coding solutions are unable to cope with the dynamic changing of transmission environment and the variety of displaying devices As it can be seen, the difference between two frames in a surveillance video usually contains a large area of background (black regions) Although the temporal distance between the 1st and 280th frame in video is relatively far, the temporal correlation between them is still high This motivates us to propose in this paper a novel scalable video coding solution, developed on the top of the most recent SHVC standard and based on an adaptive long – term reference selection mechanism 3.2 SSVC architecture Fig illustrates the proposed SSVC architecture in which two compression layers are presented The proposed adaptive long – term reference selection (ALRS) mechanism is highlighted 70 EL bitstream EL SHVC Encoder DPB Reconstructed EL SHVC Decoder BL ILP HEVC Encoder DPB BL bistream (a) ILP HEVC Decoder DPB ALRS DPB Reconstructed BL Fig Proposed Surveillance – Scalable Video Coding Architecture SSVC coding walkthrough: A surveillance video after captured from camera sensor is compressed using the proposed SSVC solution based on the following main steps: 1) Adaptive Long – Term Reference Selection (ALRS): This module creates and updates the appropriate reference frame for the SSVC inter prediction The selected reference will be indexed as a long – term reference and be stored at the decoded picture buffer (DPB) A coding flag is necessary to signal this information to make sure the decoder also knows the selected reference 2) Base layer compression: After determining the long – term reference index, the BL frame is compressed using the conventional HEVC standard Its decoded texture and motion will be stored at the DPB to be exploited later for compressing the EL frames 3) Enhancement layer compression: The enhanced quality/ resolution frames are performed in this step First, the BL decoded information is used in an inter –layer processing [4] Together with the long – term references, the base layer references are employed to better predict the EL information Finally, both BL and EL bitstreams are merged and sent to the decoder (b) Fig The LD prediction structure of the conventional video coding standards (a) and the proposed SSVC (b) As shown, in the conventional video coding standard, a frame can be referred by maximum other consecutive frames, i.e., frame number of 1,2,3,4, and can refer to the decoded information of frame number of zero However, in the proposed long – term reference, the frame number of 6, 7, or so on can still refer to the frame number of zero This allows exploiting the high temporal correlation between frames in a surveillance video 3.4 Adaptive Selection Long-term Reference In a surveillance video, there happen some scene changes when a new movement object appears In such case, a long – term reference may not effective at all Considering this problem, it is proposed to adaptively update the long – term reference Fig illustrates an example of long – term reference updating mechanism in which the new long - term reference is updated based on the video content analysis 3.3 Long – term reference structure …… GOP GOP LKR LKR To clarify the proposed long-term reference structure employed in SSVC, Fig illustrates the difference between the use of reference frames in the standard SHVC and the proposed SSVC Here, the common low-delay (LD) coding structure, is examined are compared as shown in Fig It should be noted that the long – term reference structure is employed for both base and enhancement layers GOP n-1 GOP n Fig Proposed ALRS Solution In this paper, a long – term reference selection algorithm is proposed by assessing the sum of absolute difference (SAD) between the current coded frame, and its long – term reference, The SAD metric is measured as: 71 = | ( )− ( )| proposed SSVC solution to the SHVC standard for four sequence surveillance videos from [11] (1) Here, ith is the pixel index of a frame having N pixels while tth is the frame index in a surveillance video To assess the correlation between the current frame and its reference, an adaptive threshold is computed as: ℎ = (2) In the proposed reference selection mechanism, if ( < ℎ ), the current LTR is continue used, otherwise, the most recent reference frame will be updated for LTR of the current and consecutive frames Performance evaluation and discussion To assess the proposed SSVC solution, four common surveillance videos obtained from PKUSVD-A dataset [11] were used in the experiments The name and characteristics of selected sequences are specified in Table while Fig illustrates the first frame of each tested sequence Fig RD performance comparison for the test surveillance videos with update Crossroad Intersection Table BD-rate saving SSVC-woUpd SSVC-wUpd vs SHVC vs SHVC Crossroad -1.08 -4.56 Overbridge -2.57 -6.15 Mainroad -1.79 -8.56 Intersection -0.69 -2.24 Average -1.53 -5.38 Sequences Mainroad Overbridge Fig Illustration of the first frame for the tested surveillance videos Table Summary of test conditions Test sequence and spatial resolution Frame rate and number of frames GOP size Quantization Parameters (QP) As shown in Table and Fig 6, the proposed SSVC-woUp and the SSVC-wUp achieve better compression performance when compared to the SHVC standard The BD-Rate saving are 1.53%, 5.38% in average, respectively Our proposed solutions are good for all tested sequence videos, especially surveillance video contain low-motion and single object e.g Mainroad Crossroad, 720x576 Intersection, 1600x1200 Mainroad, 1600x1200 Overbridge, 720x576 @30Hz, 297 frames Low_delay_P (IPPP…) (GOP4) QPB = {38, 34, 30, 26) QPE = QPB - Conclusions The video compression benchmark is the state-ofthe-art SHVC standard while the proposed SSVC solution is examined for two cases: the SSVC without updating the LTR (SSVC-woUpd) and the SSVC with updating the LTR (SSVC-wUpd) Fig illustrates the RD performance while Table presents the BD-rate saving [12] when compare the In this paper, we have proposed an efficient video coding solution for visual surveillance system The proposed surveillance scalable video coding solution is developed on the top of the SHVC standard and exploits the low motion characteristics observed in surveillance videos through an adaptive long – term 72 reference selection mechanism As assessed, the proposed SSVC significantly outperforms the SHVC standard The future works can consider improving the accuracy of the long – term reference selection mechanism or takes into account the quality of the long – term reference Acknowledgement This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01- 2016.15 References [1] M Valera and S Velastin, “Intelligent distributed surveillance systems: A review,” IEE Proceedings - Vision, Image and Signal Processing, vol 152, no 2, pp 192–204, Apr 2005 [2] G J Sullivan, J.-R Ohm, W.-J Han, and T Wiegand, “Overview of the High Efficiency Video Coding (HEVC) Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol 22, no 12, pp 1649-1668, Dec 2012 [3] T Wiegand, G J Sullivan, G Bjøntegaard, and A Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Circuits and Systems for Video Tecnology, vol 13, no 7, pp 560576, Jul 2003 [4] J M Boyce, Y Ye, J Chen, A.K Ramasubramo-nian, “Overview of SHVC: Scalable Extensions of the High Efficiency Video Coding (HEVC) Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol 26 no pp 20-34 Jan 2016 [5] X HoangVan, J Ascenso, and F Pereira, "Improving enhancement layer merge mode for HEVC scalable extension," in Picture Coding Symposium, Cairns, QLD, Australia, Jun 2015 [6] X HoangVan, J Ascenso, and F Pereira, "Improving SHVC performance with a joint layer coding mode," in IEEE International Conference on Acoustics, Speech and Signal Processing, Shanghai, China, March 2016 [7] H Schwarz D Marpe T Wiegand “Overview of the Scalable Video Coding Extension of the H.264/AVC Standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol 17 no pp 1103-1120 Sept 2007 [8] X.G Zhang, L.H Liang, Q Huang, Y.Z Liu, T.J Huang, and W Gao, “An efficient coding scheme for surveillance videos captured by stationary cameras,” IEEE International Conference on Visual Communication and Image Processing (VCIP), pp 77442A1–10, 2010 [9] X Zhang, L Liang, Q Huang, T Huang, W Gao, “A background model based method for transcoding surveillance videos captured by stationary camera,” IEEE Picture Coding Symposium, Nagoya, Japan, pp 78-81, 2010 [10]X Zhang, T Huang, Y Tian, and W Gao, “Backgroundmodeling-based adaptive prediction for surveillance video coding,” IEEE Transactions on Image Processing, vol 23, no 2, pp 769– 784, 2014 [11]PKU-SVD-A [Online] Available: http://mlg.idm.pku.edu.cn/- resources/pku-svd-a.html [12]G Bjontegaard, "Calculation of average PSNR differences between RD curves," Doc VCEG-M33, 13th ITU-T VCEG Meeting, Austin, TX, USA, Apr 2001 73 ... designed for genetic video content Therefore, some specific video contents like surveillance or conference videos may not benefit from its compression structure (a) 2.2 Surveillance video coding... SHVC is the latest scalable video coding solution, an extension of the well-known HEVC standard [4], providing adaptive video compression capability for a large number of video transmission environments...3 Proposed surveillance scalable video coding solution background work on surveillance and scalable video coding Afterwards, Section describes the proposed

Định dạng
Số trang	5
Dung lượng	394,88 KB