Detection of interesting events in movies using only the audio signal

57 2 0
Detection of interesting events in movies using only the audio signal

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

DUBLIN CITY UNIVERSITY SCHOOL OF ELECTRONIC ENGINEERING Detection of Interesting Events in Movies using only the Audio signal PHAM MINH LUAN NGUYEN August 2009 MASTER OF ENGINEERING IN TELECOMMUNICATIONS Supervised by Dr Sean Marlow Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Acknowledgements I would like to thank my supervisor Dr Sean Marlow for his extensive guidance, enthusiasm and commitment to this project Thanks also due to Dr David Sadlier for supporting movies and codes Thanks also to all other friends/colleagues for their contribution to the establishment Declaration I hereby declare that, except where otherwise indicated, this document is entirely my own work and has not been submitted in whole or in part to any other university Signed: ii Date: Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Abstract The imminent rapid expansion in the movie industry is driving the need for efficient digital video indexing, browsing and playback systems This report is to develop the idea which makes an automatic detector system to detect the exciting events directly from the original movie using only the audio signal Interesting events in movies are typically flagged by high audio amplitude Detection of these events based on the audio amplitude is an efficient method It is a fast detection method, which takes advantage of the fact that audio features are computationally cheaper than the visual features Then the highlight events are classified to evaluate the automatic system iii Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Contents ACKNOWLEDGEMENTS II DECLARATION .II ABSTRACT III CONTENTS IV LIST OF FIGURES VI LIST OF GRAPHS .VII LIST OF TABLES IX CHAPTER -INTRODUCTION .1 1.1 RELATED WORK 1.1.1 Automatically Selecting Shots for Action Movie Trailers 1.1.2 Voice Processing for Automatic TV Sports Program Highlights Detection 1.1.3 Audio/visual analysis for high-speed TV advertisement detection from MPEG bistream 1.2 EXCITING EVENT DETECTION IN MOVIE USING AUDIO SIGNAL CHAPTER – MPEG-1 AUDIO/VIDEO STANDARD 2.1 OVERVIEW 2.2 MPEG-1 LAYER AUDIO CHAPTER – MOVIE HIGHLIGHT DETECTION 10 3.1 GETTING GROUND TRUTH 10 3.2 AUTOMATIC DETECTION 15 3.2.1 Getting Scale Factor .16 3.2.2 Audio amplitude threshold .19 CHAPTER – RESULTS AND ANALYSIS 36 4.1 RESULTS 36 4.1.1 The average audio amplitude 36 4.1.2 The audio amplitude threshold time .36 4.1.3 Results and result tables 36 4.2 PRECISION AND RECALL 44 CHAPTER - CONCLUSIONS AND FURTHER WORK 45 iv Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN 5.1 SYSTEM EVALUATION .45 5.2 FURTHER WORK 46 REFERENCES 48 v Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN List of Figures FIGURE 2-1: ISO/MPEG-1 LAYER I/II ENCODER FIGURE 2-2: STRUCTURE OF LAYER – II SUBBAND SAMPLES FIGURE 2-3: THE DATA BITSTREAM STRUCTURE OF LAYER - II FIGURE 3-1: MPEG-1 LAYER-II FREQUENCY SUBBANDS 16 FIGURE 3-2: VIDEO FRAME AUDIO LEVELS GENERATED FROM SCALEFACTORS CORRESPODING TO TEMPORALLY ASSOCIATED AUDIO 18 vi Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN List of Graphs GRAPH 3-1: PER-FRAME AUDIO AMPLITUDE LEVEL FOR EXAMPLE MOVIE 17 GRAPH 3-2: PER-SECOND AUDIO AMPLITUDE LEVEL FOR EXAMPLE MOVIE 18 GRAPH 3-3: AUDIO AMPLITUDE PROFILE OF THE NIGHT AT THE MUSEUM 20 GRAPH 3-4: AUDIO AMPLITUDE DETECTION OF THE NIGHT AT THE MUSEUM 20 GRAPH 3-5: AUDIO AMPLITUDE DETECTION OF THE NIGHT AND THE MUSEUM AND GROUND TRUTH (BLUE IS AUTOMATIC DETECTION RED IS THE GROUND TRUTH) 20 GRAPH 3-6: AUDIO AMPLITUDE PROFILE OF THE KINGDOM 21 GRAPH 3-7: AUDIO AMPLITUDE DETECTION OF THE KINGDOM 21 GRAPH 3-8: AUDIO AMPLITUDE DETECTION OF THE KINGDOM AND GROUND TRUTH 21 GRAPH 3-9: AUDIO AMPLITUDE PROFILE OF THE LEGEND OF BUTCH AND SUNDANCE 22 GRAPH 3-10: AUDIO AMPLITUDE DETECTION OF THE LEGEND OF BUTCH AND SUNDANCE 22 GRAPH 3-11: COMPARE RESULT AUTOMATIC DETECTION AND GROUND TRUTH 22 GRAPH 3-12: AUDIO AMPLITUDE PROFILE (NIGHT AT THE MUSEUM - ONE FRAME) 24 GRAPH 3-13: AUTOMATIC DETECTION AND GROUND TRUTH (NIGHT AT THE MUSEUM – ONE FRAME) 24 GRAPH 3-14: AUDIO AMPLITUDE PROFILE (NIGHT AT THE MUSEUM – TWO FRAMES) 25 GRAPH 3-15: AUTOMATIC DETECTION AND GROUND TRUTH (NIGHT AT THE MUSEUM – TWO FRAMES) 25 GRAPH 3-16: AUDIO AMPLITUDE PROFILE (NIGHT AT THE MUSEUM - TWO SECONDS) 26 GRAPH 3-17: AUTOMATIC DETECTION AND GROUND TRUTH (NIGHT AT THE MUSEUM – TWO SECONDS) 26 GRAPH 3-18: AUDIO AMPLITUDE PROFILE (NIGHT AT THE MUSEUM – FOUR SECONDS) 27 GRAPH 3-19: AUTOMATIC DETECTION AND GROUND TRUTH (NIGHT AT THE MUSEUM – FOUR SECONDS) 27 GRAPH 3-20: AUDIO AMPLITUDE PROFILE (THE KINGDOM – ONE FRAME) 28 GRAPH 3-21: AUTOMATIC DETECTION AND GROUND TRUTH (THE KINGDOM – ONE FRAME) 28 GRAPH 3-22: AUDIO AMPLITUDE PROFILE (THE KINGDOM – TWO FRAMES) 29 GRAPH 3-23: AUTOMATIC DETECTION AND GROUND TRUTH (THE KINGDOM – TWO FRAMES) 29 GRAPH 3-24: AUDIO AMPLITUDE PROFILE (THE KINGDOM – TWO SECONDS) 30 GRAPH 3-25: AUTOMATIC DETECTION AND GROUND TRUTH (THE KINGDOM – TWO SECONDS) 30 GRAPH 3-26: AUDIO AMPLITUDE PROFILE (THE KINGDOM – FOUR SECONDS) 31 GRAPH 3-27: AUTOMATIC DETECTION AND GROUND TRUTH (THE KINGDOM – FOUR SECONDS) 31 GRAPH 3-28: AUDIO AMPLITUDE PROFILE (THE LEGEND OF BUTCH AND SUNDANCE – ONE FRAME) 32 vii Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN GRAPH 3-29 AUTOMATIC DETECTION AND GROUND TRUTH (THE LEGEND OF BUTCH AND SUNDANCE – ONE FRAME) 32 GRAPH 3-30: AUDIO AMPLITUDE PROFILE (THE LEGEND OF BUTCH AND SUNDANCE – TWO FRAMES) 33 GRAPH 3-31: AUTOMATIC DETECTION AND GROUND TRUTH (THE LEGEND OF BUTCH AND SUNDANCE – TWO FRAMES) 33 GRAPH 3-32: AUDIO AMPLITUDE PROFILE (THE LEGEND OF BUTCH AND SUNDANCE – TWO SECONDS) 34 GRAPH 3-33: AUTOMATIC DETECTION AND GROUND TRUTH (THE LEGEND OF BUTCH AND SUNDANCE – TWO SECONDS) 34 GRAPH 3-34: AUDIO AMPLITUDE PROFILE ((THE LEGEND OF BUTCH AND SUNDANCE – FOUR SECONDS) 35 GRAPH 3-35: AUTOMATIC DETECTION AND GROUND TRUTH (THE LEGEND OF BUTCH AND SUNDANCE – FOUR SECONDS) 35 viii Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN List of Tables TABLE 3-1: GROUND TRUTH OF NIGHT AT THE MUSEUM 11 TABLE 3-2: GROUND TRUTH OF THE KINGDOM 12 TABLE 3-3: GROUND TRUTH OF THE KINGDOM (CONTINUE) 13 TABLE 3-4: GROUND TRUTH OF THE LEGEND OF BUTCH AND SUNDANCE 13 TABLE 3-5: GROUND TRUTH OF THE LEGEND OF BUTCH AND SUNDANCE (CONTINUE) 14 TABLE 4-1: COMPARE RESULTS BETWEEN THE AUTOMATIC SYSTEM AND THE GROUND TRUTH 38 TABLE 4-2: POSSIBLE EXCITING EVENTS ARE DETECTED BY AUTOMATIC SYSTEM 38 TABLE 4-3: GROUND TRUTH EVENTS MISSED IN AUTOMATIC SYSTEM 39 TABLE 4-4: COMPARE RESULTS BETWEEN THE AUTOMATIC SYSTEM AND THE GROUND TRUTH 40 TABLE 4-5: POSSIBLE EXCITING EVENTS ARE DETECTED BY AUTOMATIC SYSTEM 41 TABLE 4-6: COMPARE RESULTS BETWEEN THE AUTOMATIC SYSTEM AND THE GROUND TRUTH 42 TABLE 4-7: POSSIBLE EXCITING EVENTS ARE DETECTED BY AUTOMATIC SYSTEM 43 TABLE 4-8: GROUND TRUTH EVENTS MISSED IN AUTOMATIC SYSTEM 43 TABLE 4-9: PRECISION AND RECALL VALUES FOR THREE MOVIES 44 ix Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Chapter -Introduction The growing availability of video content creates a strong requirement for efficient tools to manage or access multimedia data [3] Considerable progress has been made in audio analysis for movie content with automatic highlight detection being one of the targets of recent research Highlight detection is important, since they provide the user with a short version of the movie that ideally contains all important information for understanding the content Hence, the user may quickly evaluate the movie as interesting or not Audio, which includes voice, music, and various kinds of environmental sounds, is an important type of media, and also a significant part of audiovisual data However, since there are more and more digital audio databases in place these days, people are realizing the importance of effective management for audio databases relying on audio content analysis Audio segmentation and classification have applications in professional media production, audio archive management, commercial music usage, surveillance, and so on Furthermore, audio content analysis may play a primary role in video annotation Current approaches for video segmentation and indexing are mostly focused on the visual information However, visual – based processing often leads to a far too fine segmentation of the audiovisual sequence with respect to the diverse multimedia components (audio, visual, and textual information) will be essential in achieving a fully functional system for video parsing Existing research on content – based on audio data management is very limited There are in general four directions [6] One direction is audio segmentation and classification One basic problem is speech/music discrimination The second direction is audio retrieval One specific technique in content-based audio retrieval is query-by-humming The third direction is audio analysis for video indexing The fourth direction is the integration of audio and visual information for video segmentation and indexing Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Graph 3-32: Audio amplitude profile (The Legend of Butch and Sundance – two seconds) Graph 3-33: Automatic detection and Ground Truth (The Legend of Butch and Sundance – two seconds) 34 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Graph 3-34: Audio amplitude profile ((The Legend of Butch and Sundance – four seconds) Graph 3-35: Automatic detection and Ground Truth (The Legend of Butch and Sundance – four seconds) 35 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Chapter – Results and analysis 4.1 Results 4.1.1 The average audio amplitude The result is quite good in this case So it gets almost the high event’s audio amplitude However, there misses some things That is the exciting events which have a small audio amplitude and the high audio amplitude events are not exciting events To resolve this missing, the other method is used to experience that is audio amplitude threshold Another approach is to detect Ground Truth Because Ground Truth is done by the opinion of person It may be just optimistic opinion Ground Truth needs to by many ones to get more experience 4.1.2 The audio amplitude threshold time We know the number frames in one second is twenty five frames The previous method is just average the audio amplitude of twenty five frames to get audio amplitude in one second That may miss some events, i.e some event just happen in one or three frames Or we can decrease the length of time detection by increased the threshold time The experiences show that is quite good result at the one frame case and two frames case The other cases not get well result 4.1.3 Results and result tables • Movie 1: Night a the Museum Ground Truth: 22 events Detected events in Ground Truth: 20 events Suggested events by system (not in Ground Truth):13 events Missed Ground Truth events: events • Movie 2: The KingDom Ground Truth: 30 events Detected events in Ground Truth: 30 events Suggested events by system (not in Ground Truth):10 events Missed Ground Truth events: • Movie 3: The Legend of Butch and Sundance Ground Truth: 29 events 36 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Detected events in Ground Truth: 28 events Suggested events by system (not in Ground Truth):10 events Missed Ground Truth events: event Event Classified Events Number Ground Truth Events Detected events (hour/minute/second - by automatic hour/minute/second) system (second – (second- second) second) Music and name of movie 00.01.17 – 00.01.22 (77-88) None detected Loud noise, scream, dump 00.08.31 – 00.09.10 (531-550) 518 – 560 Loud voice, scream 00.15.26 – 00.15.49 (926 – 949) 926 – 958 Loud noise 00.24.20 – 00.24.40 (1460 – 1480) 1460 – 1480 Buster, scream, drumbeat 00.26.20 – 00.27.41 (1580 – 1661) 1588 – 1614 1638 – 1660 Drumbeat, buster, cracker, 00.30.00 – 00.32.56 (1800 – 1976) 1848 – 1900 wham, fighting, sound of 1918 – 1938 spear flying 1966 – 1986 Scream, fighting 00.35.00 – 00.35.36 (2100 -2136) 2122 – 2244 Sound of water flowing 00.49.00 – 00.49.20 (2940 -2960) 2908 - 2974 Scream, squeak 00.56.44 – 00.57.14 (3404 -3434) 3314 - 3404 3420 – 3444 10 Scream, yell, charivari 00.58.58 – 00.59.12 (3538 -3552) 3530 – 3602 11 Scream, speech 01.01.56 – 01.02.09 (3716 -3729) 3704 – 3780 12 Loud voice 01.03.30 – 01.04.30 (3810 - 3870) 3844 – 3866 13 Whirr, scream, music 01.07.07 – 01.07.40 (4027 -4060) 3984 - 4134 14 Alarm, scream, shouting 01.07.50 – 01.08.40 (4070 – 4120) 3984 - 4134 15 Scream, drum-beat, crunch, 01.14.20 – 01.16.47 (4460 – 4607) 4494 - 4632 clump, crash, footstep, loud noise 16 Trumpet-call, battle-cry 01.17.36 – 01.17.57 (4656 - 4677) 4658 – 4677 17 Beating, smack 01.19.58 – 01.20.13 (4798 -4813) None detected 18 Drum beating, fighting 01.21.11 – 01.21.40 (4871 -4900) 4844 – 4960 37 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN 19 Shouting, drum beating, 01.21.47 – 01.22.39 (4907 – 4959) 4844 – 4960 01.23.32 – 01.23.53 (5012 - 5033) 5002 – 5024 fighting 20 Crash, beating, smack 5032 – 5130 21 Drumbeating, shouting 01.24.09 – 01.24.56 (5049 -5096) 5032 – 5130 22 Roaring 01.31.49 – 01.32.09 (5509 -5529) 5524 – 5552 Table 4-1: Compare results between the automatic system and the Ground Truth (Night at the Museum 2) Event Possible exciting events are Events location in movie Length of event Number detected by automatic system (Automatic system) (seconds) (these event are not in (hour/minute/second- Ground Truth) hour/minute/second) (second- second) Loud voice, applause, 00.03.20 – 00.05.00 (200 – 300) 100 00.10.04– 00.10.24 (604 – 624) 20 shouting, speech Loud voice, shouting, speech Loud voice, scream, drumbeat 00.28.20 – 00.30.00 (1700 – 1800) 100 Loud voice, scream 00.44.21 – 00.44.48 (2661 – 2688) 27 Loud noise, scream 00.53.39 – 00.53.59 (3219 – 3239) 20 Loud voice, scream, shouting 00.55.13 – 00.56.39 (3313 – 3399) 86 Music, speech 01.00.44 – 01.01.06 (3644 – 3666) 22 Loud voice, shouting 00.64.53 – 00.65.13 (3893 – 3913) 20 Loud voice, drumbeat 00.66.19 – 00.66.39 (3979 – 3999) 20 10 Crashing, smash 00.69.40 – 00.70.14 (4180 – 4214) 34 11 Loud voice, speech 01.12.15 – 01.12.35 (4335 – 4355) 20 12 Music, sound of machine 01.26.43 – 01.27.03 (5163 – 5223) 20 13 Music, shouting 01.35.21 – 01.36.09 (5721 – 5769) 48 Table 4-2: Possible exciting events are detected by automatic system (Night at the Museum 2) 38 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Event Ground Truth Events missed Events location in movie Length of event Number in automatic system (Ground Truth) (seconds) (hour/minute/secondhour/minute/second) (second- second) Music and name of movie 00.01.17 – 00.01.22 (77-88) 11 Beating, smack 01.19.58 – 01.20.13 (4798 -4813) 15 Table 4-3: Ground Truth Events missed in automatic system (Night at the Museum 2) Classified Events Event Number Ground Truth Events Detected events (hour/minute/second- by automatic hour/minute/second) system (second- second) (second –second) Music and name of music 00.00.39 – 00.00.56 (39 – 56) 40 – 60 Speech, drumbeat 00.00.58 – 00.03.51 (58 – 231) 74 – 104 128 – 148 170 – 190 Gunshot 00.06.56 – 00.07.00 (416 – 420) 408 – 428 Gunshot, machine-gun shot 00.07.30 – 00.08.00 (450 – 480) 422 – 554 Gunshot, machine-gun shot 00.08.10 – 00.09.04 (490 – 544) 422 – 554 Loud ambulance, 00.09.28 – 00.10.33 (578 – 633) 578 – 634 voice, shouting Explosion 00.11.40 – 00.11.52 (700 – 712) 654 – 718 Speech, beating 00.14.50 – 00.15.30 (890 – 930) 856 – 896, 908 – 927 Speech, beating 00.15.42 – 00.16.20 (942 – 980) 39 958 – 978 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN 10 Whistle, sound of wheel 00.31.15 – 00.32.54 (1875– 1974) 1874 – 1916, brake 1934 - 1954 11 Sound of wheel brake 00.33.00 – 00.33.30 (1980 – 2010) 1972 – 2012 12 Gunshot, crashing 00.47.20 – 00.47.45 (2840 – 2865) 2828 – 2876 13 Alarm, drumbeat 00.51.30 – 00.52.30 (3090 – 3150) 3108 – 3128 14 Shouting, beating, machine- 00.52.56 – 00.53.10 (3176 – 3190) 3166 - 3212 gun shot 15 Drumbeat, scraping, 01.14.45 – 01.15.15 (4485 – 4515) 4484 - 4508 gunshot, explosion 16 Explosion, crashing 01.19.50 – 01.20.30 (4790 – 4830) 4782 - 5090 17 Shouting, beating, gunshot 01.20.44 – 01.21.47 (4844 – 4907) 4782 - 5090 18 Crash, beating 01.22.00 – 01.22.12 (4920 – 4932) 4782 - 5090 19 Explosion, crashing, smash 01.25.12 – 01.25.23 (5112 – 5123) 5104 – 5380 20 Explosion, gunshot, 01.25.30 – 01.25.44 (5130 – 5144) 5104 – 5380 shouting 21 Explosion, gunshot 01.26.14 – 01.26.54 (5174 – 5214) 5104 – 5380 22 Gunshot 01.27.00 – 01.27.10 (5220 – 5230) 5104 – 5380 23 Gunshot 01.27.12 – 01.28.00 (5232 – 5280) 5104 – 5380 24 Gunshot, explosion 01.28.05 – 01.28.26 (5285 – 5306) 5104 – 5380 25 Gunshot 01.30.55 – 01.31.12 (5455 – 5472) 5450 - 5672 26 Gunshot, explosion 01.31.24 – 01.31.42 (5484 – 5502) 5450 - 5672 27 Scream, gunshot, shouting 01.31.46 – 01.32.02 (5506 – 5522) 5450 - 5672 28 Scream, gunshot 01.32.47 – 01.33.05 (5567 – 5585) 5450 - 5672 29 Shouting, gunshot, beating 01.33.10 – 01.33.56 (5590 – 5636) 5450 - 5672 30 Gunshot, shouting 01.36.44 – 01.37.07 (5804 – 5827) 5800 – 5876 Table 4-4: Compare results between the automatic system and the Ground Truth (The KingDom) 40 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Event Possible exciting events are Events location in movie Length of event Number detected by automatic system (Automatic system) (seconds) (these event are not in (hour/minute/second- Ground Truth) hour/minute/second) (second- second) Applause, shouting 00.03.56 – 00.04.28 (236 – 268) 32 Applause, shouting 00.05.14 – 00.05.36 (314 – 336) 22 Applause, shouting 00.06.20 – 00.06.40 (380 – 400) 20 Loud voice, drumbeating 00.29.09 – 00.29.41 (1749 – 1781) 32 Speech, drumbeating 00.59.21 – 00.59.59 (3561 – 3599) 38 Loud voice 01.00.41 – 01.01.01 (3641 – 3661) 20 Loud voice, shouting 01.01.28 – 01.01.55 (3688 – 3715) 27 Alarm, shouting, drumbeating 01.15.34 – 01.16.11 (4534 – 4571) 37 Loud voice, speech 01.17.54 – 01.18.17 (4674 – 4697) 23 10 Whistle, speech 01.18.30 – 01.18.50 (4710 – 4730) 20 Table 4-5: Possible exciting events are detected by automatic system (The KingDom) Number Classified Events Event Ground Truth Events Detected events (hour/minute/second- by automatic hour/minute/second) system (second- second) (second –second) Gunshot, name of movie 00.00.19 – 00.00.29 (19 – 29) 12 -110 Gunshot, shouting, yell 00.01.00 – 00.02.12 (60 – 132) 12 -110 Speech 00.04.00 – 00.04.30 (240 – 270) 248 – 272 Gunshot 00.05.09 – 00.05.40 (309 – 340) 278 – 352 Gunshot, shouting, hoofbeat 00.07.54 – 00.08.10 (474 – 490) 476 - 496 Gunshot, hoofbeat 00.10.05 – 00.11.10 (605 – 670) 584 – 706 Loud voice, shouting 00.12.23 – 00.13.20 (743 – 800) 750 – 808 Shouting 00.14.23 – 00.14.50 (863 – 890) 860 – 898 41 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Scream, gunshot, shouting, 00.15.50 – 00.16.42 (950 – 1002) 942 - 1020 10 Gunshot 00.18.08 – 00.18.21 (1088 – 1101) 1084 – 1122 11 Laughing 00.19.54 – 00.20.05 (1194 – 1205) 1186 – 1230 12 Gunshot 00.23.15 – 00.23.55 (1395 – 1435) 1412 – 1432 13 Shouting, gunshot, hoofbeat 00.28.08 – 00.29.04 (1688 – 1744) 1696 – 1754 14 Sound of slide-action 00.30.01 – 00.30.26 (1801 – 1826) 1814 – 1864 15 Shouting, gunshot 00.30.30 – 00.30.46 (1830 – 1846) 1814 – 1864 16 Explosion, gunshot, shouting 00.42.03 – 00.43.00 (2523 – 2580) 2518 – 2588 17 Gunshot, shouting 00.44.15 – 00.45.19 (2655 – 2709) 2652 – 2710 18 Sound of slide-action 00.53.27 – 00.54.46 (3207 – 3286) 3220 – 3240 19 Shouting 00.55.01 – 00.55.31 (3301 – 3331) 3302 - 3352 20 Explosion 00.57.40 – 00.57.56 (3460 – 3476) 3462 – 3508 21 Shouting, hoofbeat 00.58.20 – 00.59.16 (3500 – 3556) 3516 – 3536 22 Gunshot 01.04.02 – 01.04.50 (3842 – 3890) 3880 – 3918 23 Gunshot, loud voice 01.05.02 – 01.05.58 (3902 – 3958) 3926 – 3990 24 Beating 01.10.58 – 01.11.12 (4258 – 4272) None detected 25 Drumbeating, gunshot, 01.14.19 – 01.15.50 (4459 – 4550) shouting 26 Gunshot, shouting 4478 – 4498 4506 – 4570 01.16.30 – 01.18.40 (4590 – 4720) 4572 – 4630 4644 – 4746 27 Gunshot, shouting 01.20.23 – 01.21.23 (4823 – 4883 ) 4802 – 4826 4858 – 4886 28 Loud voice 01.21.43 – 01.22.27 (4903 – 4947) 4902 – 4940 29 Gunshot 01.23.45 – 01.24.10 (5025 – 5040 ) 5026 – 5046 Table 4-6: Compare results between the automatic system and the Ground Truth (The Legend of Butch and Sundance) 42 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Event Possible exciting events are Events location in movie Length of event Number detected by automatic system (Automatic system) (seconds) (these event are not in (hour/minute/second- Ground Truth) hour/minute/second) (second- second) Gunshot, loud voice 00.08.48 – 00.09.30 (528 – 570) 42 Loud voice 00.15.10 – 00.15.30 (910 – 930) 20 Shouting, loud voice, drum 00.24.14 – 00.24.34 (1454 – 1474) 20 Gunshot, shouting 00.32.00 – 00.32.42 (1920 – 1942) 22 Loud voice, laughing 00.39.08 – 00.39.42 (2348 – 2382) 34 Scream, sound of train 00.40.58 – 00.41.46 (2458 – 2506) 48 machine Crashing, smash 00.51.50 – 00.52.10(3110 – 3130) 20 Shouting, gunshot 00.56.28 – 00 57.34 (3388 – 3454) 66 Applause, whoop 01.07.02 – 01.07.22 (4022 – 4042) 20 10 Speech, loud voice 01.23.08 – 01.23.28 (4988 – 5008) 20 11 Shouting, scream 01.19.34 – 01.19.54 (4774 – 4794) 20 Table 4-7: Possible exciting events are detected by automatic system (The Legend of Butch and Sundance) Event Ground Truth Events missed Events location in movie Length of event Number in automatic system (Ground Truth) (seconds) (hour/minute/secondhour/minute/second) (second- second) Beating 01.10.58 – 01.11.12 (4258 – 4272) Table 4-8: Ground Truth Events missed in automatic system (The Legend of Butch and Sundance) 43 14 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN 4.2 Precision and Recall For a better insight into the individual accuracy the result for each video, two important figures of merit for the movie highlight system were calculated: The Recall measure is value representing the percentage of all detected material corresponding to true exciting events Recall = 100 * [Number of events (Ground Truth) - No events missed] Number of events (Ground Truth) Precision measure is a percentage showing how accurate the system is at exclusively detecting exciting events in movie Precision = 100 * [Number of events (Ground Truth) - No.events missed] Number of events (Ground Truth) - No.events missed + No suggested events by system Example: Precision and Recall for Night at the Museum For Night at the Museum the Precision and Recall figures were calculated as follows: (from information Table 4-1, Table 4-2, Table 4-3) Number of events = 22 (from Ground Truth) Number of events missed = Number of events falsely identified = 13 Precision =100*[(22-2)/ (22-2+13)] = 66.06 Recall = 100*[(22-2)/22] = 90.90 Movie Precision Recall Night at the Museum 66.06 90.90 The KingDom 75.00 100 The Legend of Butch and 73.68 96.55 Sundance Table 4-9: Precision and Recall values for three movies 44 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN Chapter - Conclusions and Further work 5.1 System Evaluation In all, three movies comprising 296 minutes of digital video were analysed The following are the main points to be noted: • The system detected the events in all three movies Although the system detected almost the events in movie, it just missed some events The system also suggested some events which were not in Ground Truth • All exciting detections attained a Precision percentage greater than 65% • All exciting detection attained a Recall percentage greater than 90%, especially the Kingdom movie has Recall percentage 100% The Night at the Museum movie is a science fiction movie The audio amplitude is quite the same over the entire movie So that is the reason for low Precision and Recall value The KingDom and the Legend of Butch and Sundance are action movies The exciting events in two movies have very high audio amplitude The detection result in two movies is better than the detection result in Night at the Museum In the Night at the Museum, the suggested events by system almost the loud voice events or music in events These events may be not exciting events but they have high audio amplitude so the system has detected them Consequently, the system may not work well in this movie type In the Kingdom, all gunshot events were detected In the Legend of Butch and Sundance, gunshot events were almost detected Although the missed gunshot events were not exciting events, they were also detected in the Legend of Butch and Sundance So these results can give us the gunshot detection method using the audio amplitude signal In other hand, Ground Truth was done manually There is also an effective factor for the result comparison So Ground Truth may be different from the other people We should have more Ground Truth per movie to know how well the automatic detection method performs In the detection program, two detection cases have been used The first case is audio amplitude detection in one second The second case is the audio amplitude threshold time Two detection cases gave two different results The results are quite good in first case, but 45 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN the results in second case belong to the audio amplitude threshold time If the audio amplitude threshold time is one frame or two frames, the results gave quite well When the audio amplitude threshold time is more than one second, the results became inaccurately Because the audio amplitude threshold time was calculated by average the all audio amplitude frames in that time, i.e the audio amplitude threshold time is two seconds, we calculated by average fifty audio amplitude frames series to get the one audio amplitude in two seconds That method may miss some exciting events when the high audio amplitude of exciting events just happens in one frame or two frames, and the audio amplitude of other frame does not high 5.2 Further work The system had detected almost the events in Ground Truth However, it also gave us some events which were not sure exciting So we continue to find how we get the exactly events which we needed The missing events happened by two principles: the Ground Truth and the features of audio to detect The feature of audio is used amplitude in this case so it did not get the best result The other features of audio have to study in future, i.e frequency of exciting event audio, noise of exciting event audio, event audio length The other hand, the Ground Truth was just done manually, that may lose some exciting events by person’s mind Consequently, the Ground Truth should be done by more people to get good Ground Truth to work The type of movie used in this report is action movie so we got almost exciting events in movie So we need to change type of movie to get experience We need to study other type of movie: thrilling movie, comedy movie…The thrilling movie may be a good choice for next study because the exciting events in this movie type usually happen suddenly with high audio amplitude The times taken to get the audio amplitude from movie are too long It is about three or four hours per movie This stage takes us more time So we need to find out an efficient method to get the audio amplitude We need to use the other movie file type We use the MPEG-1 Layer II We are based on Scale Factor to get audio amplitude level So we need to use the other movie file type i.e MPEG -2, MPEG-4 We will also use another approach to get Scale Factor When we change the audio amplitude threshold, the precision and recall value also change better or worse A value also gives effect to audio amplitude threshold that is the optimum 46 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN value in range [1.8; 2.2] This value is to multiply to the average result to get the audio amplitude threshold We also tried to use an adaptive audio amplitude threshold That is to average various audio amplitude window, i.e we choose a window which has twenty audio-amplitudeframes Then we calculated the average of audio amplitude that window The result of calculation is to multiply with an optimum value to get the window-audio-amplitudethreshold We compare all audio amplitude in window with the window-audio-amplitudethreshold If an audio amplitude in window is higher than the window-audio-amplitudethreshold, we pick up that audio-amplitude- frame However, the result of this method is not stable, we need to study more about this method 47 Detection of Interesting Events in Movies using only the audio signal– PHAM MINH LUAN NGUYEN References [1] Marina Bosi, Richard E.Goldberg, Introduction to digital audio coding and standards, Springer, 2003 [2] Andreas Spanias, Ted Painter, Venkatraman Atti, Audio signal processing and coding Wiley, 2007 [3] Alan F Smeaton, Bart Lehane, Noel E O’Connor, Conor Brady and Gary Craig, Automatically Selecting Shots for Action Movie Trailer, 2006 [4] Sean Marlow, David A Sadlier, Noel O’Connor, Noel Murphy , Voice Processing for Automatic TV Sports Program Highlights Detection, 2003 [5] Sean Marlow, David A Sadlier, Noel O’Connor, Noel Murphy, Audio and Video Processing for Automatic TV Advertisement Detection, 2002 [6] Tong Zhang, C.-C Jay Kuo, Heuristic Approach for Generic Audio Data Segmentation and Annotation,1999 [7] Tong Zhang, C.-C Jay Kuo, Video Content Parsing Based on Combined Audio and Visual Information,1999 [8] Dian Tjondronegoro, Yi-Ping Phoebe Chen, Binh Pham, Sports Video Summarization using Highlights and Play-Breaks, 2003, Berkeley, California, USA, [9] Alexander G Hauptamann, Michael J Witbrock, Story Segmentation and Detection of Commercials in Broadcast News Video, IEEE Computer Society, 1998 [10] Aggelos Pikrakis, Theodoros Giannakopoulos, Sergios Theodoridis, Gunshot detection in audio streams from movies by means of dynamic programming and Bayesian networks, ICASSP, 2008 [11] Ying Li and C.-C Jay Kuo, Movie Event Detection by Using Audiovisual Information, Springer-Verlag , 2001 [12] Yong Rui, Anoop Gupta, and Alex Acero, Automatically Extracting Highlights for TV Baseball Programs, ACM, 2000 [13] A F Smeaton, Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience, 2005.http://www.springerlink.com/content/27c0mpeqhu1ugy40/ [14] A Aslam and E Yilmaz, A geometric interpretation and analysis of R-Precision, ACM Press, 2005 [15] The Official MPEG Committee Website http://www.chiariglione.org/mpeg/ 48 ... Night at the Museum 2, The KingDom, The Legend of Butch and Sundance There are some detection result graphs of three movies 19 Detection of Interesting Events in Movies using only the audio signal? ??... fourth direction is the integration of audio and visual information for video segmentation and indexing Detection of Interesting Events in Movies using only the audio signal? ?? PHAM MINH LUAN NGUYEN... be the gunshot event, fighting events, crash events, or explosion events So the audio amplitude may be helpful to highlight the events Detection of Interesting Events in Movies using only the audio

Ngày đăng: 17/06/2021, 16:11

Tài liệu cùng người dùng

Tài liệu liên quan