Ứng dụng neural networks trong dự đoán thông lượng nguồn video

Đại Học Quốc Gia TP Hồ Chí Minh TRƯỜNG ĐẠI HỌC BÁCH KHOA LUẬN VĂN THẠC SĨ ỨNG DỤNG NEURAL NETWORKS TRONG DỰ ĐOÁN THÔNG LƯNG NGUỒN VIDEO CHUYÊN NGÀNH : KỸ THUẬT VÔ TUYẾN – ĐIỆN TỬ MÃ SỐ NGÀNH : 2.07.01 LÊ THANH TÂN TP HỒ CHÍ MINH, THÁNG NĂM 2004 CÔNG TRÌNH NÀY ĐƯC HOÀN THÀNH TẠI TRƯỜNG ĐẠI HỌC BÁCH KHOA ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH Cán hướng dẫn khoa học: PGS.TS Lê Tiến Thường Cán chấm nhận xét 1: PGS.TS Nguyễn Hữu Phương Cán chấm nhận xét 2: TS Nguyễn Đức Thành Luận văn bảo vệ HỘI ĐỒNG CHẤM BẢO VỆ LUẬN VĂN THẠC SĨ TRƯỜNG ĐẠI HỌC BÁCH KHOA, ngày 16 tháng năm 2004 BỘ GIÁO DỤC VÀ ĐÀO TẠO ĐẠI HỌC QUỐC GIA THÀNH PHỐ HỒ CHÍ MINH TRƯỜNG ĐẠI HỌC BÁCH KHOA CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM Độc lập – Tự – Hạnh phúc NHIỆM VỤ LUẬN VĂN THẠC SĨ Họ tên học viên: LÊ THANH TÂN Phái: Nam Ngày tháng năm sinh: 25.04.1979 Nơi sinh: Phú Yên Chuyên ngành: KỸ THUẬT VÔ TUYẾN ĐIỆN TỬ Mã số: VTĐT13.031 Khóa: 13 (2002-2004) I TÊN ĐỀ TÀI: “ng dụng neural networks dự đoán thông lượng nguồn video” II NHIỆM VỤ VÀ NỘI DUNG: Nghiên cứu cấu trúc dạng mã hoá MPEG-1, MPEG-2, MPEG-4 Nghiên cứu cấu trúc mạng Neural Networks, phân tích giải thuật huấn luyện cách huấn luyện Thiết kế dự báo SSP (linear single-step predictor): mạng hồi quy (RFMP) để dự báo I-VOP Thiết kế dự báo SSP (linear single-step predictor): mạng feedforward (FMLP) để dự báo P-VOP, B-VOP Thiết kế dự báo MSP (multi-step predictor) để ước lượng mức thông lượng nguồn Mục đích sử dụng dự báo ứng dụng on-line thời gian thực Trong đề tài này, dự báo bước thực dựa sở mạng hồi quy RFMP (Recurrent feedforward multilayer perceptron) Ứng dụng dự báo việc dự báo chuỗi video mã hoá với mức lượng tử dạng mã hoá MPEG khác Trình bày phần mềm mô môi trường Matlab III NGÀY GIAO NHIỆM VỤ: 15.02.2004 IV NGÀY HOÀN THÀNH NHIỆM VỤ:11.7.2004 V HỌ VÀ TÊN GIÁO VIÊN HƯỚNG DẪN: PGS.TS Lê Tiến Thường VI HỌ VÀ TÊN GIÁO VIÊN PHẢN BIỆN: PGS.TS Nguyễn Hữu Phương TS Nguyễn Đức Thành GVHD GVPB1 GVPB2 Nội dung đề cương Luận văn Cao học thông qua hội đồng chuyên ngành PHÒNG QUẢN LÝ KHOA HỌC SAU ĐẠI HỌC Ngày…… tháng…… năm CHỦ NHIỆM NGÀNH LỜI CẢM ƠN Em xin chân thành cảm ơn thầy Phó Giáo Sư, Tiến só Lê Tiến Thường tận tình hướng dẫn, đóng góp nhiều ý kiến quý báu giúp em hoàn thành luận án tốt nghiệp Em xin cảm ơn thầy cô môn Viễn thông, khoa Điện – Điện tử, trường Đại học Bách khoa TP HCM tận tình giảng dạy giúp đỡ em suốt thời gian theo học Đại học sau Đại học trường Con xin cảm ơn ba mẹ, người nuôi dạy khôn lớn, dẫn dắt đến đường học tập, trau dồi tri thức Tôi xin cám ơn tất bạn bè, người thân động viên, giúp đỡ suốt trình học tập thời gian thực luận án TP HCM, tháng năm 2004 KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường ABSTRACT Nowsdays, multimedia is responsible for an increasing fraction of traffic over networks and this tendency is going on So algorithms for the prediction of traffic source are very essentially developed They could satisfy with designing the effective dynamic bandwith allocation methods and implementing multimedia quality – of – sevice (QoS) control strategies This thesis introduce an approach for improve MPEG – coded video source traffic predictors, these use in single – step – ahead and multi – step – ahead prediction Both single – step predictor (SSP) and multi – step predictor (MSP) are tested accurately on video stream coded MPEG-1 or MPEG-4 TÓM TẮT Ngày nay, multimedia nguyên nhân làm tăng thông lượng nguồn xu hướng tiếp tục tiếp diễn Vì thế, giải thuật dự báo thông lượng nguồn cần nghiên cứu phát triển Các giải thuật giúp cho việc cấp phát băng thông động thực điều khiển chất lượng dịch vụ (QoS) multimedia cách hiệu Đề tài đưa phương pháp cải thiện dự báo thông lượng nguồn video mã hoá MPEG Bộ dự báo sử dụng dự báo đơn bước (SSP) đa bước (MSP) Cả hai dự báo dùng để dự báo thông lượng nguồn video mã hoá MPEG-1 MPEG-4 Abstract HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường Đặt vấn đề Ngày nay, dự báo tốc độ bit thông lượng nguồn video mạng đa phương tiện phát sinh từ lý quan trọng sau: • ng dụng tiện lợi cho cấp phát băng thông động • Sự điều khiển chất lượng dịch vụ dòng đa phương tiện thời gian thực Dòng đa phương tiện truyền qua mạng mà không cần bảo vệ dịch vụ, ví dụ mạng nghi thức internet (IP) ⇒ Trong tương lai không xa, việc dự báo thông lượng nguồn nhu cầu thiếu tính toán mạng ứng dụng • Trong thông tin, việc tận dụng hiệu băng thông mối quan tâm hàng đầu Để thích ứng cách linh động với băng thông cấp phát cho user đầu cuối, dự báo thông lượng nguồn user bắt buộc phải thực Hiện tại, multimedia dạng làm gia tăng thông lượng mạng xu hướng tiếp tục tăng Nếu có giải thuật dự báo thông lượng nguồn tốt (thông lượng nguồn nguồn multimedia gây ra), mục đích thiết kế phương pháp cấp phát băng thông động thực thi cách hiệu • Như biết, tất gói liệu phải có mặt nơi thu để ghép lại, gói phải trình tự Do đó, giải pháp đưa dùng đệm bên nguồn thu Hiện nay, đệm thay edgecaching Phạm vi nghiên cứu HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường Có nhiều khó khăn trình truyền dòng multimedia (có thể video hay audio) qua mạng thời gian thực Đó dòng liệu không yêu cầu bảo mật cho dịch vụ Nguyên nhân khó khăn bắt nguồn tính trễ truyền nội dung multimedia Bất kỳ phương pháp ứng dụng truyền thông tin mạng IP phải chấp nhận bỏ khoảng thời gian Bộ giải mã dùng khoảng thời gian để giải mã thông tin trình phát lại Để khắc phục tình trạng này, đệm dùng đầu thu Hiện nay, edge-caching thay cho đệm Sự tương nhượng kỹ thuật không chuyển liệu đến đầu thu thời gian thực Thậm chí nhiều ứng dụng thông tin phải chịu khoảng delay đường truyền, nhiều ứng dụng yêu cầu truyền thời gian thực gần với thời gian thực Ví dụ: game, hội họp từ xa, điện thoại, … Khi thiết kế thực thi điều khiển chất lượng dịch vụ mạng IP ứng dụng yêu cầu thời gian thực hay gần với thời gian thực, khó khăn lớn phải biết thông lượng nguồn, biết giá trị tương lai chuỗi thời gian Các kết nghiên cứu • Chỉ nghiên cứu mô hình nguồn ngẫu nhiên nguồn thống kê • ng dụng dự báo tuyến tính để dự báo thông lượng nguồn Chỉ dự báo cho I-frame, P-frame • Chưa có dự báo dành cho B-frame, MPEG-4 video traces Phạm vi nghiên cứu đề tài Phạm vi nghiên cứu HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường • Nghiên cứu cấu trúc dạng mã hoá MPEG-1, MPEG-2, MPEG-4 • Nghiên cứu cấu trúc mạng Neural Networks, phân tích giải thuật huấn luyện cách huấn luyện • Thiết kế dự báo SSP (linear single-step predictor): dự báo I-VOP, P-VOP, B-VOP, Moving Average time-series • Thiết kế dự báo MSP (multi-step predictor) để ước lượng mức thông lượng nguồn Mục đích sử dụng dự báo ứng dụng on-line thời gian thực Trong đề tài này, dự báo bước thực dựa sở mạng hồi quy RFMP (Recurrent feedforward multilayer perceptron) • Ứng dụng dự báo việc dự báo chuỗi video mã hoá với mức lượng tử dạng mã hoá MPEG khác Bố cục luận văn • Chương 1: Mở đầu Chương giới thiệu tổng quát nén video • Chương 2: Mã hoá DCT (Discrete Cosine Transform) Chương giới thiệu phép biến đổi DCT(Discrete Cosine Transform) ứng dụng nén video • Chương 3: Các tiêu chuẩn nén video Chương trình bày tiêu chuẩn nén JPEG, MPEG • Chương 4: Tổng quan nén MPEG-4 Phạm vi nghiên cứu HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường Chương trình bày nén MPEG-4 ng dụng tiêu chuẩn MPEG-4 nén tín hiệu video • Chương 5: Sơ lược Neural Networks Chương giới thiệu sơ lược Neural Networks mạng Neuron • Chương 6: Giới Thiệu Về Mạng Neural Networks Chương trình bày cách tạo mạng Neuron cụ thể ứng dụng mạng Neuron • Chương 7: Chương trình kết • Kết luận hướng phát triển • Phụ lục: Trình bày báo đựơc báo cáo ISASE 2004 Phạm vi nghiên cứu HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường 7.3.2.5 MSP (Multi-step-ahead predictor) dự đoán Moving Average timeseries Hình 10 Kết dự đoán Moving Average time-series Silence of the Lamb (chất lượng cao) dùng 2-step-ahead prediction Bảng 15 sau trình bày kết video chất lượng cao Bảng 15: Trace RMSE(%) MAE (bytes) MRE Alddin 8.5 6523.2 18.2 ARD Talk 2.73 5724.2 2.63 Jurassic Park 3.5 8411.1 14.8 Lecture Room 32.4 6245.3 10.5 Star Wars 3.13 8044.4 14.2 Skiing 6.31 6245.4 17.5 Chương trình kết Trang 131 HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường 7.3.2.6 Kết dự báo MPEG-1 video Dự đoán I-Frames mạng Adaptive learning rate: Bảng 16: Trace RMSE (%) Tạp Chí Star Wars 1.2 2.4 Terminator 2.1 6.7 Mr Bean 0.95 4.1 Dự đoán P-Frames mạng Gradient descent: Bảng 17: Trace RMSE (%) Tạp Chí Star Wars 22.91 16.8 Terminator 13.23 15.7 Mr Bean 10.85 15.8 Chương trình kết Trang 132 HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video GVHD: PGS.TS Lê Tiến Thường KẾT LUẬN VÀ HƯỚNG PHÁT TRIỂN Đề tài đúc kết thành báo báo cáo ISASE 2004 (The 2004 International symposium on advanced science and engineering) Bài báo kèm theo phần phụ lục Kết luận • Mạng SSP (Single-Step-Ahead Predictor) có khả dự đoán tốt I-VOPs hay I-frames Tất thông số đánh giá chất lượng RMSE nhỏ 10 % Với P-VOPs hay P-frames, thông số chất lượng RMSE có cao kết nhỏ 10% Kết dự đoán B-VOPs hay B-frames không tốt hầu hết kết nhỏ 10% • Để hiệu cấp phát băng thông động thời gian dự đoán ngắn hơn, đề tài đưa phương pháp dự đoán chuỗi trung bình (Moving average time-series) Kết dự đoán Moving average time-series SSP cho kết tốt Hầu hết kết thu có RMSE nhỏ 10% • Đề tài đưa phương pháp dự báo MSP (Multi-step-ahead prediction) Chương trình thực two-step-ahead predictor Mục đích đưa chương trình thực real-time, tăng tốc độ dự đoán • Đề tài thực huấn luyện mạng Neuron với giải thuaät Bayersian regularization, Gradient Descent, Resilient Backpropagation, Adaptive learning rate, Adaptive linear v.v Do đó, kết cải thiện đựơc tốc độ học thông số huấn luyện mạng Sai số dự đoán giảm xuống Kết luận hướng phát triển Trang 133 HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video GVHD: PGS.TS Lê Tiến Thường Mạng Batch Static có thời gian huấn luyện nhanh triệt nhiễu tốt Thường sử dụng số epoch: 500 đến 1500, mạng hội tụ chậm Mạng Levenberg Marquard có tốc độ hội tụ nhanh lại tốn nhiều nhớ Cho nên, cho chuỗi bits tín hiệu lớn gây tượng OUT OF MEMORY One Step Secant cần lưu trữ tính toán nên khắc phục việc OUT OF MEMORY Mạng Quasi Newton có tốc độ hội tụ nhanh phải tính toán phức tạp Mạng thích hợp cho việc huấn luyện online Bayersian Regularization có lan truyền ngược chỉnh sửa cho trọng số ngưỡng nhỏ tăng tính tổng quát hóa mạng Tuy nhiên, thời gian huấn luyện lâu Adaptive Learning Rate có tốc độ thay đổi tùy thuộc gradient Tốc độ thường nhanh nên phù hợp cho việc huấn luyện online Tuy nhiên, hoạt động theo kiểu Batch training Gradient descent: có đáp ứng chậm, cần huấn luyện theo kiểu Incremental training Tuy nhiên, theo kiểu huấn luyện sai số thường lớn Elman: Đây mạng có hồi tiếp (Backward), có đường hồi tiếp từ ngõ lớp ẩn ngõ vào Vì vậy, có khả triệt nhiễu tốt đặc biệt tốc độ học nhỏ Nếu tốc độ học lớn mạng bị dao động Kết luận hướng phát triển Trang 134 HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video GVHD: PGS.TS Lê Tiến Thường Hướng phát triển Luận văn thực mô báo SSP (Single-Step-Ahead Predictor) dự báo I-VOPs, P-VOPs, B-VOPs Moving average time-series hay Iframes, P-frames, B-frames; dự báo MSP (Multi-Step-Ahead Predictor) dự báo Moving average time-series Tuy nhiên, kết MSP chưa có kết xác, có dao động đặc biệt số step tăng lên Luận văn thực mô matlab Hướng phát triển thực kit DSP thực real-time Thực mô MPEG-7 đáp ứng nhu cầu tương lai Kết luận hướng phát triển Trang 135 HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video GVHD: PGS.TS Lê Tiến Thường TÀI LIỆU THAM KHẢO [1] Bart Kosko, Neural Networks for Signal processing, ISBN 0-13-614694-5 [2] Tarun Khanna, Foundations of Neural Networks, ISBN 0-201-50036-1 [3] Neural Network Toolbox User’s Guide, © COPYRIGHT 1992 - 1998 by The MathWorks, Inc [4] Trang web: http://www-tkn.ee.tu-berlin.de/research/trace/trace.html http://www.garni.ch/bttvgrab [5] Joan L.Mitchell – William B.Pennebaker – Chad E.Fogg – Didier J.LeGall, MPEG Video Compression Standard, Chapman And Hall, New York, 1997 [6] Rafael C Gonzalez and Richard E Woods, Digital Image Processing, Addison – Wesley Reading, Mass – 1991 [7] Peter Symes, Video Compression, McGraw-Hill, New York, 1997 [8] Lecture of MPEG-4 structure, Department of Electrical Engineering, Hong Kong University of Science and Technology [9] Rob Koenen, International organisation for standardisation organisation internationale de normalisation iso/iec jtc1/sc29/wg11 coding of moving pictures and audio, March 2001 [10] Hong Zhao, Nirwan Ansari and Yun Q Shi, Efficient predictive banwidth allocation for real time Video, IEICE TRANS COMMUN, VOL E86-B, NO.1 January 2003 [11] Frank H.P Fitzek and Martin Reisslein, MPEG-4 and H263 Video traces for Network performance evaluation, Berlin, October, 2000 [12] A Chodorek and R.R Chodorek, “An MPEG-2 video traffic prediction based on phase space analysis and its application to on-line dynamic bandwidth allocation,” Taøi liệu tham khảo Trang 123 HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video GVHD: PGS.TS Lê Tiến Thường in Proc 2nd Eur Conf Universal Multiservice Networks, Apr.8-10, 2002, pp 4455 [13] A G Parlos, O T Rais, and A F Atiya, “Multi- step-ahead prediction using dynamic recurrent Neural Networks,” Neural Networks, vol.13, pp.765-786, Sept 2000 [14] Romuald BONEÙ, Michel CRUCIANU, An Evaluation of Constructive Algorithms for Recurrent Networks on Multi-Step-Ahead Prediction, International Conference on Neural Information Processing (ICONIP'02), Singapore, November 2002, Vol 2, pp 547-551 [15] Alexander G.Parlos, Kil T Chong and Amir F Atiya, Application of the Recurrent multilayer perceptron in modeling complex process dynamics, 1994 [16] Thuong Le-Tien, Tan Le-Thanh, Chi Nguyen-Duc, Neural Networks applied to prediction of video source traffic, The 2004 International symposium on advanced science and engineering (ISASE 2004), HCMUT, Vietnam, May 2004 [17] Đỗ Hoàng Tiến – Vũ Đức Lý, Truyền Hình Số, Nhà xuất khoa học kỹ thuật Hà Nội 2001 [18] GS.TS Nguyễn Kim Sách, Truyền Hình Số Có Nén Và Multimedia, Nhà xuất khoa học kỹ thuật Hà Nội 2001 [19] Lê Thanh Tân, Lê Hoành Sử, Nâng cao chất lượng hệ thống thông tin vệ tinh dùng Wavelets cân Neural Networks, Luận văn tốt nghiệp Đại Học, năm 2002 Tài liệu tham khảo Trang 124 HV: KS Lê Thanh Tân PHỤ LỤC Phụ lục NEURAL NETWORKS APPLIED TO PREDICTION OF VIDEO SOURCE TRAFFIC Thuong Le-Tien, Tan Le-Thanh*, Chi Nguyen-Duc Electrical and Electronic Engineering Department, HCMC University of Technology *Electronic Engineering Department, HCMC University of Technical Education ABSTRACT Nowsdays, multimedia is responsible for an increasing fraction of traffic over networks and this tendency is going on So algorithms for the prediction of traffic source are very essentially developed They could satisfy with designing the effective dynamic bandwith allocation methods and implementing multimedia quality – of – sevice (QoS) control strategies This thesis introduce an approach for improve MPEG – coded video source traffic predictors, these use in single – step – ahead and multi – step – ahead prediction Both single – step predictor (SSP) and multi – step predictor (MSP) are tested accurately on video stream coded MPEG-1 or MPEG-4 Introduction Prediction of video source traffic bit rates arises from two important reasons: - Bandwith allocation is more dynamic - Quality-of-service (QoS) control of real-time multimedia streams are transported over networks without offering sevice guarantees ⇒ In the future, video source traffic prediction is one of the most essential needs in network computations and applications - Efficient utilization of bandwidth is a topic attracting much attention in the networking community To adapt the allocated bandwidth dynamically, the prediction of video source traffic must be done by end-users Today, multimedia is responsible for the increasing fraction of traffic over networks If there are good algorithms for predicting video source traffic, we could design efficient dynamic bandwidth allocation methods There are many problems when transporting multimedia streams in real-time because of the delay-sensitive nature of multimedia content Any methods applied to transport data over an IP networks must spend delay time t In this time, the decoder is used to decode data during playback In order to solve this problem, the buffer is used at the receiver Today, an edgecatching is used instead of a buffer The tradeoff of this approach is that the data not arrived in real-time, or even in real-time Even though Phuï luïc many media applications must be accepted large delays in delivery, While many media applications require real-time or near real-time delivery, for example gaming, conferencing, telephony, and teleoperation The most problem faced when designing and implementing edgebased QoS control in IP networks for real-time applications is knowing source traffic bit-rate and the this future value * The reported research in multimedia source traffic prediction: - Researching only on development of statistical and stochastic models - Applying linear predictor to predicting multimedia source traffic Predicting only the sizes of I- and P-frames Research content - The structures of MPEG video coding, such as MPEG-1, MPEG-2, MPEG-4 Especially, MPEG-4 is mainly studied - Designing and implementing Single-StepAhead Predictors (SSPs): these are approximated by a Feedforward Multilayer Perceptron (FMLP) and used to predict I-VOPs - Designing and implementing Single-StepAhead Predictors (SSPs): these are approximated by a Recurrent Multilayer Perceptron (RMLP) and used to predict P- and B-VOPs - Designing and implementing Multistep-Ahead Predictors (MSPs): The main application of this Phuï luïc type of predictor is on-line and real-time needs In this report, two and four-step predictors are basically approximated by a Recurrent Multilayer Perceptron (RMLP) - Applying all types of predictors to predict video traces with a variety of quantization levels, coded with both MPEG-1 and MPEG-4 Overview of MPEG coding MPEG standards (Moving Picture Expert Group Standards) are series of video compression standard aimed to code pictures and sounds at 1.5 to 50 Mbps 3.1 Types of frames defined by MPEG-1: - Intra-frame (I-frame): This type of frame contains data that use to restore the whole frame I- frames are allowed to access randomly but the compression rate is very low - Predicted- frame (P- frame): This type of frame is coded with motioncompensating from either an I-frame or previous P-frame P-frame has higher compression rate than I-frame P-frame is used as comparing frame in motion-compensation - Bi-directional predicted frame (B-frame): This type of frame is coded with motioncompensating from an I-frame or previous Pframe or following P-frame B-frame has the highest compression rate - Group of picture (GOP): Starting is I-frame and following is a sequence of P- and B-frame I B B P B B P B B P B B I Fig GOP structure GOP sequence is featured by M and N parameters M is the distant between two Phuï luïc consecutive I-frames and N is the distant between two consecutive P-frames 3.2 MPEG-4 The main purpose of MPEG-4 is coding audio and video at very low bit-rate - Visual Object Plane (VOP): VOP is an object of MPEG-4 picture - One scene is represented by a sequence of VOPs For static object, one scene, however, is represented by only one VOP - GOV: this is similar to GOP in MPEG-1 The blocks of pictures in MPEG-4 data flows are encoded I-, P- and B-VOP In this paper, MPEG-4 bit stream was encoded from YUV information The number of video objects was set to one, i.e, the entire scene is one video object The width of display is set to 176 pels, the height is set to 144 pels and a pel depth of bits per pel is used The single video object was encoded into a single video object layer And the video object layer frame rate is set to 25 frames/sec GOV pattern was set to IBBPBBPBBPBB And each video traces were encoded at three quality levels: low, medium, and high For the low quality encoding, the quantization parameters were fixed at 10 for I-VOPs, 14 for P-VOPs, 18 for BVOPs For the medium quality encoding, the quantization parameters for three VOP types were fixed at 10 For the high quality encoding, the quantization parameters for three VOP types were fixed at 4 Neuron predictor 4.1 General structure There are two basic neural structures: Feedforward Multilayer Perceptron (FMLP) and Recurrent Multilayer Perceptron (RMLP) - FMLP is standard type of Feedforward Neural Networks - RMLP is the multilayer network, in which there are a lot of lines conecting between hide layers These lines are called crosstalk connections The important purpose of these lines is increasing memory of networks That is suitable for systems having dynamic models 4.2 Predictor Phuï luïc The picture following describes SSP predicting I-VOP: - Single-Step-Ahead Predictor (SSP): for The value at t+1 is predicted by the values at up to t SF x I(i-1) I(i-1) I(i-2) + δ I (i − 1) z −1 + δ I (i − 1) z input I(i-3) I(i-2) z −1 δ I (i − 2) - Post-processing −1 I(1) z −1 SF δ I (i − 2) RMLP 15-7-1 z −1 x + - z −1 + - e(i-1) P1 (i − 1) P1 (i − 2) Iˆ* (i i − 1) P2 (i − 1) z −1 e(1) e(i-2) Iˆ* (i i − 1) ∑ P2 (i − 2) x /(i − 2) P3 (i − 1) ea (i − 1) P3 (i − 2) + + I 70 (i − 1) I 200 (i − 1) Iˆ(i i − 1) output Fig Neural Networks structure of SSP for predicting I-VOP ( yˆ (t + t ) = f y (t ), , y (t − n y + 1), u (t ), , u (t − nu + 1) (1) nu , n y : are maximum delay values of input and output respectively - Multistep-Ahead Predictor (MSP): The value at t+1 is predicted recursively by the values at up to t-p+1, where p>1 yˆ (t + t − p + 1) = f ( yˆ (t t − p + 1), , y (t − n y + t − p + 1), u (t ), , u (t − nu + 1) Results 5.1 Three parameters used for assessing the predictor: The first assessment parameter is called relative mean square error (RMSE): u(t): input y(t): output (2) u(t): input y(t): output nu , n y : are maximum delay values of input and output respectively Phuï luïc + - ∑ (S ( j ) − Sˆ ( j )) N RMSE = j =1 N (3) ∑ S( j) j =1 S ( j ), Sˆ ( j ) are the actual time-series and predicted time-series The second assessment parameter is the maximum absolute error (MAE) and is used to measure the maximum error: Phuï luïc MAE = max S ( j ) − Sˆ ( j ) 1≤ j ≤ N (4) The third assessment parameter is maximum relative error (MRE) MRE = ˆ max S ( j ) − S ( j ) 1≤ j ≤ N S( j) (5) 5.2 The results 5.2.1 SSP of I-VOPs Because each of the video sequences has been encoded using three different quantization levels, the results of testing will display performance metrics respectively The results of using RMLP networks, such as Elman Networks, Hopfield Networks, have the best performances Although Elman networks are sometime not as reliable as some other kinds of networks because both training and adaption happen using an approximation of the error gradient [4], the results of Elman are ell accepted The results of Hopfiled are the same as these of Elman So the followings are the results of predictor that are using Elman Networks The structure of RMLP networks has layers, 15 input nodes, hiden nodes, output node Inputs are sizes of the previous three I-VOPs, the two consecutive first derivatives of I-VOP sizes, the second derivative of I-VOP sizes, and the moving averages of I-VOP sizes δI (i ) = I (i) − I (i − 1) (6) δ I (i ) = δ I (i ) − δI (i − 1) = I (i ) − I (i − 1) − I (i − 1) + I (i − 2) (7) = I (i ) − I (i − 1) + I (i − 2) I a (i ) = a i ∑ Ij j =i − a +1 (8) Fig Predicted I_VOP sizes of Silence Of The Lambs Trace (High quality) The table-1 following displays performance metrics of SSP for I-VOP sizes of High-quality Traces: Trace RMSE MAE MRE (bytes) (%) Alddin 2.4 8654.9 14 ARD Talk 0.9 6656.2 2.5 Jurassic Park 0.8 8891.2 7.6 Lecture Room 0.3 2677.5 0.8 Star Wars 1.6 4771.3 6.3 Skiing 2.0 4815.8 2.3 The table-2 following displays performance metrics of SSP for I-VOP sizes of Mediumquality Traces: Trace RMSE MAE MRE (bytes) (%) Alddin 2.6 8948.1 15.1 ARD Talk 1.0 6671.5 2.6 Jurassic Park 0.8 8900.2 7.7 Lecture Room 0.36 2687.1 0.81 Star Wars 1.65 4782.7 6.34 Skiing 2.4 5048.1 2.4 The table-3 following displays performance metrics of SSP for I-VOP sizes of Low-quality Traces: Trace RMSE MAE MRE (bytes) (%) Alddin 3.2 9398.4 17.3 ARD Talk 1.2 6857.2 2.8 Jurassic Park 0.95 8933.4 7.9 - The results of predicting I-frames of MPEG1 Traces: Fig Original I_VOP sizes of Silence Of The Lambs Trace (High quality) Phuï luïc The table-4 following displays performance metrics of SSP for I-frame sizes: Phuï luïc Trace Star Wars Terminator Mr Bean RMSE(%) 1.2 2.1 0.95 5.2.2 SSP of P-VOPs It is similar to the results of SSP of I-VOPs, the results of SSP of P-VOPs will display three results of performance metrics respectively The FMLP networks are used And they use a lot of training algorithms such as Basic gradient descent, Adaptive learning rate, Resilient backpropagation, BFGS quasi-Newton method, One step secant method, Levenberg-Marquardt algorithm, Bayesian regularization One problem that can occur when training neural networks is that the network can overfit on the training set and not generalize well to new data outside the training set It can also be prevented by using early stopping with any of training routines This requires that the user pass a validation set to the training algorithm, in addition to the standard training set [3] Because Bayesian Regularization algorithm is the modification of the Levenberg-Marquardt training algorithm, it has both LevenbergMarquardt training algorithm’s good characters and some additional effective ones It can produce networks that generalize well and reduce the difficulty of determining the optimum network architecture It also is the fastest training algorithm because it has memory reduction feature for use when the training set is large [4] The followings present the results of Neural Network Predictor that use Bayesian Regularization algorithm The table-5 following displays performance metrics of SSP for P-VOP sizes of High-quality Traces Trace RMSE MAE MRE (bytes) (%) Alddin 10.4 10094.8 11.4 ARD Talk 5.3 10066.1 5.5 Jurassic Park 3.8 12331.8 5.7 Lecture Room 6.3 2513.2 5.08 Star Wars 8.9 9147.3 11.6 Skiing 6.05 4841.8 22.3 Phuï luïc The table-6 following displays performance metrics of SSP for I-VOP sizes of Mediumquality Traces Trace RMSE( MAE MRE %) (bytes) Alddin 10.6 10089.1 12.1 ARD Talk 5.41 10097.5 5.4 Jurassic Park 3.85 12380.2 5.73 The table-7 following displays performance metrics of SSP for I-VOP sizes of Low-quality Traces Trace RMSE MAE MRE (bytes) (%) Alddin ARD Talk Jurassic Park 11.0 5.2 3.95 10090.4 10067.2 12389.4 11.3 5.8 5.69 - The results of predicting P-frames of MPEG-1 coded video Traces: Trace RMSE (%) Star Wars 22.91 Terminator 13.23 Mr Bean 10.85 The table-8 following displays performance metrics of SSP for I-frame sizes: 5.2.3 SSP of B-VOPs The FMLP networks are used And they use a lot of training algorithms such as Basic gradient descent, Adaptive learning rate, Resilient backpropagation, BFGS quasi-Newton method, One step secant method, Levenberg-Marquardt algorithm, Bayesian regularization The composite time-series consisting of I-VOP and P-VOP is used as the input to the Neural Networks This time-series can be represented as IPPPIPPPI… Phuï luïc Fig Original B-VOP Sizes of Jurasic Park Trace Fig Predicted B-VOP Sizes of Jurasic Park Trace The table-9 following displays performance metrics of SSP for B-VOP sizes of High-quality Traces: Trace RMSE( MAE MRE %) (bytes) Alddin 7.84 8233.8 14.4 ARD Talk 3.03 4655.1 35.5 Jurassic Park 2.38 8312.1 65.2 Lecture Room 26.03 790.2 4.8 Star Wars 3.45 4124.3 24.1 Skiing 13.05 1814.4 27.2 5.2.4 SSP of moving average time-series of VOPs The table-10 following displays performance metrics of SSP for moving average time-series of VOP sizes of High-quality Traces: Trace RMSE( MAE MRE %) (bytes) Alddin 3.5 8523.2 19.2 ARD Talk 0.73 7024.2 2.97 Jurassic Park 1.54 7541.1 11.8 Lecture Room 12.3 7250.3 11.2 Star Wars 1.36 7841.4 18.2 Skiing 3.21 7825.4 22.5 Conclusion - The SSP predictor runs very well with I-VOPs, or I-frames The RMSE performances always below 4% For P-VOPs or P-frame, the performances are good enough In general, these values are lower than 10% The result of predicting B-VOPs or B-frames is worse than that of predicting I-VOPs and P-VOPs Phuï luïc - The most successful work is using some feedforward neural algorithms for training, such as Bayersian, Adaptive Linear, etc These ones improve the speed of learning and parameter of networks So the errors of predicting process are decreased at aceppted levels when comparing to other previous reports - There are some comments about the feedforward neural algorithms: Basic gradient descent algorithm: this algorithm has the slow response character, so it must be used in incremental training mode Especcially in training online application, it can not be accepted to be used Because it may have unexpected large error in incremental training mode Adaptive learning rate algorithm: it has fast training character, but it can only be used in batch mode training -This paper not only uses Feedforward Multilayer Perceptron, but also introduces some Recurrent Networks such as Elman Networks, Hopfield Networks Elman networks are capable of learning to detect and generate temporal patterns because of having an internal feedback loop So Elman networks are useful in prediction areas Because Elman networks are an extension of the twolayer sigmoid/linear architecture, they inherit the ability to fit any input/output function with a finite number of discontinuities They are also able to fit temporal patterns, but may need many neurons in the recurrent layer to fit a complex function The results above give the best experimental proof Hopfield networks can act as error correction or vector categorization networks Input vectors are used as the initial conditions to the network, which recurrently updates until it reaches a stable output vector Hopfield networks are interesting from a theoretical standpoint, but are seldom used in practice Even the best Hopfield designs may have spurious stable points that lead to incorrect answers More efficient and reliable error correction techniques, such as backpropagation, are available References [1] Bart Kosko, Neural Networks for Signal processing, ISBN 0-13-614694-5 Phuï luïc [2] Tarun Khanna, Foundations of Neural Networks, ISBN 0-201-50036-1 [3] Neural Network Toolbox User’s Guide, © COPYRIGHT 1992 - 1998 by The MathWorks, Inc [4] MS Chien Hoang Dinh, B.Eng Tan Le Thanh, B.Eng Su Le Hoanh, The graduation thesis for Bacholer Degree: “ Using Neural Networks And Wavelets For Noise Suppresion In Satellite Communication Systems”, Ho Chi Minh City University of Technology, 1997 [5] Asst.Prof, Ph.D Thuong Le-Tien, B.Eng Tan Le-Thanh, The graduation thesis for Master Degree: “Neural Networks Applied To Prediction Of Video Source Traffic”, Ho Chi Minh City University of Technology, 2004 [6] Asst.Prof, Ph.D Thuong Le-Tien, B.Eng Uyen Vo Thi Hang, The graduation thesis for Master Degree: “Wavelets Applied To Encode Video At Very Low Rate”, Ho Chi Minh City University of Technology, 2003 [7] Web pages: http://www-tkn.ee.tuberlin.de/research/trace/trace.html http://www.garni.ch/bttvgrab [8] Papers in IEEE magazine in 1998, 1999, 2000 [9] Joan L.Mitchell – William B.Pennebaker – Chad E.Fogg – Didier J.LeGall, MPEG Video Compression Standard, Chapman And Hall, New York, 1997 [10] Peter Symes, Video Compression, McGraw-Hill, New York, 1997 [11] Tien Do-Hoang– Ly Vu-Duc, Digital Television, Ha Noi Science and Technology Public House, 2001 Sach Nguyen-Kim, Digital [12] Ph.D Television with compression and Multimedia, Ha Noi Science and Technology Public House, 2001 Phuï luïc ... lượng mức thông lượng nguồn Mục đích sử dụng dự báo ứng dụng on-line thời gian thực Trong đề tài này, dự báo bước thực dựa sở mạng hồi quy RFMP (Recurrent feedforward multilayer perceptron) Ứng. .. hai dự báo dùng để dự báo thông lượng nguồn video mã hoá MPEG-1 MPEG-4 Abstract HV: KS Lê Thanh Tân Đề tài: Dự Báo Thông Lượng Nguồn Video THD: PGS.TS Lê Tiến Thường Đặt vấn đề Ngày nay, dự báo... internet (IP) ⇒ Trong tương lai không xa, việc dự báo thông lượng nguồn nhu cầu thiếu tính toán mạng ứng dụng • Trong thông tin, việc tận dụng hiệu băng thông mối quan tâm hàng đầu Để thích ứng cách

Định dạng
Số trang	161
Dung lượng	1,66 MB