Khóa luận tốt nghiệp: Ước lượng tần số chuyển động của vật thể dựa trên khuếch đại video và ứng dụng

HỒ CHÍ MINH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAMTRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc CÔNG NGHỆ THÔNG TIN ĐĂNG KÝ ĐÈ TÀI KHÓA LUẬN TÓT NGHIỆP Tên đề tài: Ước lượng tần số chuyển động

Trang 1

VIETNAM NATIONAL UNIVERSITY — HO CHI MINH CAMPUS

UNIVERSITY OF INFORMATION AND TECHNOLOGY

FACULTY OF COMPUTER SCIENCE

Object’s vibration frequency estimation

using video magnification and applications

PHAN THANH NHAN - 19521944

INTRUCTOR :

DR NGUYEN VINH TIẾP

Trang 2

DANH SÁCH HỘI ĐÒNG BẢO VỆ KHÓA LUẬN

Hội đồng cham khóa luận tốt nghiệp, thành lập theo Quyết định số

ngày của Hiệu trưởng Trường Đại học Công nghệ Thông tin.

1 TS Lê Minh Hưng — Chủ tịch.

2 Ths Nguyễn Thị Ngọc Diễm — Thư ký.

3 Ths Đỗ Văn Tiến — Ủy viên.

Trang 4

ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM

TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc

CÔNG NGHỆ THÔNG TIN

ĐĂNG KÝ ĐÈ TÀI KHÓA LUẬN TÓT NGHIỆP

Tên đề tài:

Ước lượng tần số chuyển động của vật thể dựa trên khuếch

đại video và ứng dụng

Tên đề tài tiếng Anh:

Object’s vibration frequency estimation using video

magnification and applications

Ngôn ngữ thực hiện: Tiếng Anh

Cán bộ hướng dẫn: TS Nguyễn Vinh Tiệp

Thời gian thực hiện:Từ ngày 5/09/2022 đến ngày 24/12/2022.

Sinh viên thực hiện:

Phan Thành Nhân - 19521944 Lớp:KHCL2019.3

Email:19521944@gm.uit.edu.vn Điện thoại:0918095450

Nội dung đề tài:(Mô tả chi tiết mục tiêu, phạm vi, đối tượng, phương pháp thực hiện,

kết quả mong đợi của đề tài)

Giới thiệu đề tài :

“If you want to find the secrets of the universe, think in terms of energy, frequency and

vibration.” Nikola Tesla Nhà bác hoc Nikola Tesla từng có một câu nói : “nếu bạn muốn tim thấy được bí mat

của vũ trụ , hãy suy nghĩ về năng lượng , tần số và sự rung động “

Trang 5

Như chúng ta đều biết, vạn vật trong vũ trụ này đều được hình thành từ các hạt vật

chất , và chúng đều sẽ xảy ra giao động khi gặp đúng tần số cộng hưởng của nó Từ

những con ốc bị lỏng trong máy móc công nghiệp, cho đến các động mạch đập dưới da đều liên tục sinh ra các giao động có tần số ổn định Tuy nhiên đó là những chuyển

động mà mắt thường con người không thể nhìn thấy được, bởi sự giao động của chúng

quá nhỏ Tuy nhiên, các chuyển động này vẫn có thể được phần nào ghi lại bởi các máy

ảnh điện tử và được khuếch đại bởi các thuật toán cũng như các mô hình học sâu.

Từ đây có thể giúp chúng ta quan sát được các hiện tượng mà trước đó không thể

quan sát được Đồng thời ước lượng được tần số giao động của vật thể và biến chúng

thành những thông tin hữu ích và trực quan hơn cho con người có thể tham khảo.

Đề tài này nhắm đến việc tạo ra một ứng dụng với input là một video có bao gồm một

đối tượng đang giao động, output sẽ là một video đã khuếch đại các chuyển động nói

trên và tần số giao động ước lượng được Từ đó có thể sử dụng tần số ước lượng được

cho các ứng dụng ví dụ như : do nhịp tim cũng như phục hồi lại những âm thanh từ đó

Với những nghiên cứu sơ bộ, em nhận thấy đã có rất nhiều hướng tiếp cận cho bài

toán frequency estimation, tuy nhiên trong phạm vi bài toán này, em xin tìm hiểu dạng

bài toán video magnification với hướng tiếp cận đến từ các nhà nghiên cứu thuộc đại

học MIT với các bài toán nổi bật như Learning-based Video Motion Magnification

khuếch đại chuyển động trong video giúp quá trình ước lượng tần số của chuyển động

được dễ dàng hơn.

Mục tiêu đề tài :

- Hiéu được dạng bài toán khuếch đại chuyên động cũng như bài toán frequency

estimation

- Ứng dụng phương pháp vào các input tự thu thập được trong thực tế

- Minh họa trực quan phương pháp và ứng dụng thực tiễn trong nhiều lĩnh vực, ví

dụ như công nghiệp và y khoa.

Nội dung nghiên cứu của đề tài : Nội dung 1 : tìm hiểu về quy trình của phương pháp

- Khảo sát và tổng hợp tài liệu liên quan đến các công nghệ cũng như các kỹ thuật

được sử dụng trong các bài báo : Motion manipulation - motion magnification,

deep convolutional neural network ,visual acoustics Qua đó tổng quát hóa quy

trình của phương pháp.

- Chay thử các model và dataset được cung cấp sẵn và đánh giá.

- Tỉnh chỉnh cũng như update các code lỗi thời thành một phiên bản phù hợp hơn.

- Kết qua dự kiến : báo cáo kết quả chạy thử va tng hợp tai liệu kỹ thuật chỉ tiết về

phương pháp

Trang 6

Nội dung 2 : Xây dựng ứng dụng minh họa và cái test case thực tế

- _ thực hiện quay các video về các hiện tượng thực tế như máy móc hoặc mạch đập

trên tay

- Biên tập các đoạn video để phù hợp nhất với ứng dụng.

- Xây dựng demo UI

+ Kết qua dự kiến : ứng dung demo và đánh giá hiệu năng

Tài liệu tham khảo

- Eulerian Video Magnification for Revealing Subtle Changes in the World ,

- Learning-based Video Motion Magnification MIT CSAIL,

Cambridge, MA, USA

http://people.csail.mit.edu/mrub/vidmag/papers/deepmag.pdf

Kế hoạch thực hiện:

+Giai đoạn 1 (5/9/2022 đến 15/10/2022 ) : Nghiên cứu bài báo,tìm hiểu công nghệ

được sử dụng, chạy mô hình và dataset bài toán sử dụng , ghi chép lại các thông số

chạy model

+ Giai đoạn 2 (15/10/2022 đến 20/11/2022 ) : Sử dụng bộ data khác để thử nghiệm

model được chạy ở giai đoạn 1, ghi chép các thông số và đánh giá kết quả, phân tích và chỉ ra các điểm ảnh hướng tới kết quả dự đoán Để từ đây tạo thành các video test case phù hợp nhất

+ Giai đoạn 3 (20/11/2022 đến 20/12/2022) : xây dựng ứng dụng minh họa và tìm

hướng cải thiện kết quả dự đoán.

+ Giai đoạn 4 (20/12/2022 đến khi báo cáo ): Tìm hướng cải thiện kết quả dự đoán,

viết file bao cao , chuẩn bị slide bảo vệ khỏa luận.

Xác nhận của CBHD TP HCM, ngày tháng năm 2022

(Ký tên và ghi rõ họ tên) Sinh viên

(Ký tên và ghi rõ họ tên)

Trang 8

| would like to begin this thesis by acknowledging all of whom | owe this complishment We can not be the place we are today without all the unwavering support and guidance that was given to us.

ac-To my instructor Dr.Nguyen Vinh Tiep, his guide has become a shining con for my journey His vast wealth of knowledge has given me countless ideas

bea-and provided us with state-of-the-art equipment for my research, pointing out

my inaccuracies and giving us priceless instructions | would like to show my utmost gratitude to Dr.Tiep for his contributions.

To all the teachers and professors in the computer science department, | would like to send my appreciation for all of the clear directions that were given when | was on my way to completing this thesis.

To our friends at the MMLab, who has been nothing but the greatest group

of friend anyone could ever ask for Without all the help and feedback | received

my thesis could not be as complete as it is today.

furthermore, | would like to acknowledge my family to whom | owe everything My achievement could never come to fruition without all the love and

sacrifices that my family has made.

Trang 9

1 Abstract

2 Introduction

2.1 Overview

2.2 Motivation

2.3 Applicability

2.3.1 Applicability in medical

2.3.2 Applicability in industry

2.3.3 Applicability in astrology

2.3.4 Sound retrieval from still video

2.4 Challenges and solutions

2.4.1 Challenges

Random noise Ringing artifact

2.4.2 Solutions

Setup solutions

Algorithm solutions

2.5 the main goal

2.6 Contributions

3 Background 3.1

OpticalfloW -3.2 Motion Magnification (2005)

3.3 Eulerian Video Magnification [Wu et al 2012]

3.4 Phase-Based Video Motion Processing

17 18 18 19 19 20

21

21 23 26

30

Trang 10

Contents 3

4 Related Works 36

4.1 Learning-based Video Motion Magnification 36

4.1.1 Introdution to Learning-based Motion Magnification 39

Deep Convolutional Neural Network Architecture 40

4.1.2 Synthetic Training Dataset 40

4.1.3 pixel Intensity signal tracking 43

5 Proposed method 45 5.0.1 Dataset 45

5.0.2 Dataset Collection 46

5.0.3 Object’s vibration frequency estimation using video magnifi-cation and applimagnifi-cations 48

6 Results and Evaluations 51 6.0.1 Conclution 54

6.0.2 future work_ 56

Trang 11

List of Figures

2.1 Input and output using video magnification + frequency estimation

2.2 Example: the pulse frequency output using video magnification +

frequency estimation

2.3 this the result of a video magnification + frequency estimation that

was proven to be reliable comparing to a real medical machine) 2.4 Example of a machine that can be track using camera

2.5 Example of the sky full with stars 0.

2.6 A mock setup that can be used to test this application

2.7 Example of ringing artifact (a) is a sharp image (b) Is a picture with

ringing artifact

2.8 The set up that was used in the thesis to collect data

3.1 Mô tả sample set (training set)

3.2 Mô tả sample set (training set) c.

3.3 Learned regions of support allow features (a) and (b) to reliably track

the leaf and background, respectively, despite partial occlusions For feature (b) on the stationary background, the plots show the x (left)

and y (right) coordinates of the track both with (red) and without

(blue) a learned region of support for appearance comparisons The track using a learned region of support is constant, as desired for feature point on the stationary background

3.4 The input and output frame that show the deform after magnification

10

13 14 15 16

18 19

21 23

24 25

Trang 12

3.10 Comparison of result on a common sequence

An example of using our Eulerian Video Magnification framework for

visualizing the human pulse (a) Four frames from the original video

sequence (face) (b) The same four frames with the subject’s pulse

signal amplified (c) A vertical scan line from the input (top)

and output (bottom) videos plotted over time shows how our method amplifies the periodic color variation In the input sequence the signal

is imperceptible, but in the magnified sequence the variation is clear.

The complete sequence is available in the supplemental video 26

Overview of the Eulerian video magnification framework 27

relationship between temporal processing and motion magnification 29 the diagram of the phase-based approach manipulates motion 31

A big world of small motions Representative frames from videos

in which we amplify imperceptible motions The full sequences and results are available in the supplemental video 32

[2012] and our approach for motion magnification The representation

size is given as a factor of the original frame size, where k represents the number of orientation bands and n represents the number of filters per octave for each orientation 35

Detailed diagram for each part denotes a convolutional layer of c channels, k x k kernel size, and stride s 37 Overview of the architecture Our network consists of 3 main parts:

the encoder, the manipulator, and the decoder During training, the inputs to the network are two video frames, (Xa, Xb), with a magnification factor a, and the output is the magnified frame Y* 40 Picture from MS COCO dataset as the background 41 Picture from segmented objects from the PASCAL VOC dataset 42 Applying our network in 2-frame settings We compare our network

applied in dynamic mode to acceleration magnification Because is based on the complex steerable pyramid, their result suffers from

ringing artifacts and blurring 43

Example of the result of intensity signal tracking 44

Trang 13

List of Figures 6

4.7 Example of the result of intensity signal tracking on human face 44

5.1 Testing dataset that was collected 46 5.2 How the testing dataset was collected 46 5.3 the testing data that was collected are sharp and usable 47 5.4 the testing data that was collected are sharp and usable 47 5.5 showing the detail get lost after the process 49 5.6 Our improvement by adding stoping and anchor point 49

6.1 This is one of the test object that we used - a box fan 51 6.2 This is one of the test objects that we used - a pet mouse sleeping 52 6.3 This is the output showing the improvement 53

6.4 This is the original 55

6.5 This is the original 55

6.6 This is the oufpul 55

Trang 14

Chapter 1

Abstract

Motion tracking and frequency estimation are one of the most famous and

funda-mental problems for computer vision study in general Using algorithms to track the

motion of objects, We can estimate the frequency of their motion But most of the research for this field of study only focuses on visually intensive motion like moving cars or humans walking because it can easily be tracked and monitored, this also means that objects that can not be easily tracked like a slightly shaking machine or

a flower moving in the wind are hard to work with and motion frequency can not be estimated because it only happens on a small pixel space hence getting the name small or "hidden" motion To combat this problem most research aims to use

expensive high-resolution cameras to zoom into the object to monitor it or just

ignored all object that is too small completely Since cameras are getting better and

better and cheaper over time, this was not really a problem Because of this

algorithm base approaches are less explored and researched.

But if there is a way for us to see these vibrations, the applications that they can

bring us are endless And that is the reason why a new brand of study was created called motion magnification, With the help of algorithms and new advances in

machine learning, a computer can now pick up extremely small pixel vibrations or

changes of value in a video frame and magnified it or in another word amplify the

vibrations signal, resulting in an output video with all the temporal vibrating motion

magnified helping human to see the previously "hidden" motion with our own two

eyes these motions can then be tracked and monitored giving us the estimated frequency that the object is resonating This can open new possible usage for this technology in many different fields of work.

This thesis will dive deeply into all of the studies mentioned with the main

Trang 15

Chapter 1 Abstract 8

goal is to find the state-of-the-art approach which is Learning-based Video Motion

Magnification[5] and using it to build an Object’s vibration frequency estimation

system We will go through path of the paper by running the pre-train model and all the data that the authors provided, helping us to see and understand all the inner

workings of the approach Collect and build a testing dataset for real-world

applications since all of the data the author provided is too old, low resolution, and lacks in test cases After that, we will fine-tune, upgrade some of the obsolete parts,

improve some of the drawbacks of the model to better fit our solution and check the

result of the test data we made Finally, implement all of the improvements that we proposed to the system that we build and evaluate the result.

keyword: Motion magnification, Frequency estimation

Trang 16

Chapter 2

Introduction

Trang 17

Chapter 2 Introduction 10

21 Overview

Any and every movement can be the result of one or more motion waves hitting the surface of an object making it resonate and vibrate in a frequency These motions only happen on a small area of pixels and the change can sometimes only be less than 10 pixels of difference from frame to frame hence consider too small to track.

But these small vibrations can be picked up by a camera and amplified using a

mathematics algorithm or in this case, a deep convolutional neural network (CNN) call learning base motion magnification[5] filer out all of the small motion in the

video and magnifies it to a desired degree and return a video with altered frames to

better visualize the deformations or vibrations in it, the same process will also capture and estimate the frequency that the object is vibrating in.

Input: A video that has some small vibration with a still background ( recorded by a standard commercial camera or a smartphone).

Output: The same video with altered frames that show the vibration more clearly and visually detectable + estimated frequency at which the object is vibrating.

inararery 8€ ”

(a) Input (wrist) * (b) Motion-amplified

Figure 2.1: Input and output using video magnification +

fre-quency estimation

Time

Figure 2.2: Example: the pulse frequency output using video

mag-nification + frequency estimation

Trang 18

2.2 Motivation

After doing a deep dive into the subject of frequency estimation, there have been many studies that have tried to solve this problem such as, but all of these studies only focus on the original input and try to estimate the frequency on their own Most

of the time they only work on clear and visible object motions and consider all other

small motions to be too small to detect and ignored completely By using a motion

magnification layer, we can now intensify the motion to a more noticeable degree hence opening the scope of the topic that previous studies have not considered |

believe there is ground for some improvement so this thesis chose to explore that.

Everything in this universe is vibrating, oscillating, and resonating, at various frequencies, it might seem motionless yet every particle that made up these object are

in constant motion every object has its own vibration frequency, at a certain frequency

objects resonate and create a vibration motion this is a natural phe-nomenon that we

can observe in real life like a car engine vibrating or a flower moving ever so slightly in the wind But most of the time these motion is too small for human eyes to actually see

them happen Hence we often ignore it and move on with our daily life For many

decade human has tried to measure the vibration of an object with many different tools, from using gyroscopes to microscopic sensors Most recently using camera base

sensors to estimate the frequency of moving objects but this method is still not

developed enough to reliably estimate small motions.

Since the beginning of this field of research, the heavy focus was to amplifying subtle changes in the signal of every pixel It can be the change in the color value or movement of a group of pixels during a period of time Previous attempts analyze and amplify subtle motions and visualize deformations in video sequences These approaches follow a Lagrangian[6] perspective, in reference to fluid dynamics where the trajectory of particles is tracked over time in this case instead of fluid particles the study sees every pixel as an individual particle Capture and amplified small motions in

a video by identifying pixels in a frame and tracking it through the temporal change of every frame to achieve motion magnification this can amplify subtle motions in a video sequence, allowing for visualization of deformations that would otherwise be invisible But this also opens the floodgate to noise and artifacts polluting the final output And when it comes to the algorithm complexity, the computing power required is immense hence making the approach highly inflexible,

Trang 19

methods proving it has greater potential for more accurate applications.

2.3 Applicability

In this day of age, any kind of data can be processed and give us back valuable infor-mation about our world and help us to better understand it and situated ourselves in a more advanced position.

With the ability to estimate the vibration frequency of an object,the study can be

used as a measurement aid to help users better visualize their surrounding

environ-ment and made better adjustenviron-ments This study can be used in multiple industries and its potential of it can be endless some of the real-world use cases are :

others measurement equipment for more vital signals This can be a problem since

a standard heart rate monitor (HRM) or Electrocardiogram (EKG/ECG) device can cost anywhere between 110 to 188 US dollars according to MDSAVE.COM [4] a leading healthcare equipment provider base in San Francisco, USA to equip a

room of 10 patients it can cost the least 1100 dollars or around 27,5 million VND With the ability to track, and magnified the change in skin color of the patient, the

deep convolutional neural networks can return a reliable estimation of heart

Trang 20

child’s body this call for a complex workaround like an electronic mat that

measures the heartbeat or small microphone put close to the baby mouth all of these solutions are invasive and not very scalable but with a motion magnification camera, we can provide a non-invasive and reliable method.

We can all remember the shortage of medical equipment during the covid 19

pandemic, This new tech can play a vital role in future health emergencies since

it can be turned into an app and put online for every on to use on their phone,

Opening the chance for wider medical application.

Figure 2.3: this the result of a video magnification + frequency estimation that was

proven to be reliable comparing to a real medical machine)

Trang 21

2.3.2 Applicability in industry

Vibration in every industry is a factor that is closely monitored Any moving

ma-chine or moving part in a mama-chine has the potential to malfunction and move in

a way that damages itself and the whole machine a loose screw can make a

ma-chine shake during operation creating unnecessary wear and tear and

shortening the lifetime of the equipment

For example, in a running motor, one of its base screws loosens during along time of operation resulting in a prolonged period of small but intensify

vibration Over a period of time, the screw will get looser and looser finally

breaking the base of the pump because of violent vibration This is one of themost common ways to lose a valuable machine just because of a loose screw

To combat this problem many companies chose to put motion censers ontheir machines to monitor and track the vibration of the moving part But then

the problem will be how to put millions of censers on millions of moving parts in

a factory

Once again monitoring these kinds of vibrations requires a trained technician with

an expensive measurement tool the ability for a company to just buy software and a good security camera to monitor all of their equipment is an industry-changing solution that can be considered groundbreaking No need for expensive monitoring censers or

putting a human in hazardous environments, a camera can stream the live feed to the main control room where everything is processed, truly 4.0 style.

Figure 2.4: Example of a machine that can be track using camera

Trang 22

2.3.3 Applicability in astrology

with powerful telescopes, scientists can track stars and planet moving in orbitlight years away, Based on the frequency of light coming from the star, they can

learn about the composition of a planet and where there it have any moon

orbiting it For a long time, astrologists have all way pioneered state-of-the-art

solutions to solve this very problem, they are no strangers to this field of study and are also one of the most productive contributors to math and algorithm

development With the establishment of artificial intelligence/machine learningsuch as this study on a Learning-based Video Motion Magnification, a promisingfuture is right ahead Based on the groundwork of Astrophysics can study andbetter understand our universe

just by magnifying the motion of a start you can see its spin direction of it, ucan estimate its speed, and how many moons it has the application is endless

Trang 23

2.3.4 Sound retrieval from still video

The reason why we call it "gimmicks" is that there are not a lot of real words

used for this kind of application, except for highly specific situations

In principle, sound waves move through the air and hit the surface of an

object making it resonate at a corresponding frequency Since the Motion

Magnification CNN has the ability to estimate the vibration frequency of an

object, we can in theory capture the sound frequency of the original sound wave

and then used a Fourier transform step to retrieve the actual sound

Trang 24

like what we have on our phone can bearly pick up any motion if it is out of its

focus range also the problem of random noise artifacts and border ringing

artifacts These two types of artifacts contaminated almost all input videos and

will be magnified in the output video

Past approaches use complex hand-crafted filters which can not distinguish what is noise and what is not, all of them are "pixel changes" in the eyes of these filters.

Random noise

A handheld camera record picture or video using a system of lens andelectronic sensor to capture the light and turn it into digital bit form this processhas a load of uncontrollable variations like static, defects on the lens, or randomlight reflection inside of the camera All of this is made up of small and randomnoises on the recorded video there is all most no way to block these kinds of

random noise in real life but only to reduce it to an acceptable level

a handcrafted filter can magnify the motion signal but can also magnify the noise too since we all know that it doesn’t really know that is noise and what is motion.

Ringing artifact

In digital image processing, ringing artifacts are artifacts that appear randomlynear sharp transitions in a signal Visually, they appear as bands or "ghosts"

near the edges The term "ringing" is because the output signal oscillates at a

fading rate around a sharp transition in the input, similar to a bell after being

struck As with other artifacts, their minimization is a criterion in filter design

This is also part of physic, and will all way be there

Trang 25

Figure 2.7: Example of ringing artifact (a) is a sharp image (b) Is

a picture with ringing artifact

2.4.2 Solutions

Setup solutions

As we previously stated above, these kinds of static or noise can not beeliminated completely since the nature of the problem is based on random

environmental values and micro defects in the making of the camera We

however can reduce it to a small and acceptable range by doing these steps:

Stabilize the camera: Putting the camera on a tripod camera stand can help reduce vibration and help us position the camera at an optimal viewing angle.

Lighting: a constant and bright light source can help illuminate the objectand reduce the blend between it and the background remove shadow and help

us better identify all the features of the object

Background: a good background can make all the difference since themodel was trained to pick out the change between the background and a

moving object so a clear background will greatly increase the result.

A quality camera: the better the camera, the better the output That is just a

simple fact, as better equipment allows us to record in greater resolution andbetter detail but remember all cameras now a day have some sort ofstabilization setting, for our study, turn this feature off

Trang 26

Algorithm solutions

With the new Learning-based Video Motion Magnification, the algorithm cannow understand what is and is not motion and give back a better result than

ever before reduce ringing artifacts, and less random noise, overall we can see

that this output is sharper and more visually pleasing This will play a greaterrole in getting the best frequency estimation

2.5 the main goal

throughout this thesis, our main goal is to research all there is that we can learnfrom this field of study from the knowledge that was gathered, we aim to replicate

the result of the authors and see how well it operates on our own test data After

the primary goal, we will also try to test all of the real-world usages we envision

After that, implement improvements that we think up and test the result

Trang 27

+ Input, output component is updated to python

+ Config all of the legacy library so that it can run on a newer version of Python

3 Create a lot of testing data that can be used by later researchers After doing

some tests on the code and data which we were provided, we realize that the

data for visualization testing or testing, in general, is very limited, most peoplejust use the same old quality video over and over again We understand that

using the same video can help everyone have the same ground truth, but it

would be quite boring to see the same example over and over again

4 Propose some new ways to modify the model and the input to make the

result better and help with the model giving out degrading result.

Trang 28

comparing the two continuous images.

At this point in time Lucas-Kanade’s optical flow algorithm is one of the best

to track motion it works by laying down some rules to follow:

Trang 29

Chapter 3 Background 22

- The †wo images are in a time series and it is different by a small time increment

A t, in such a the way that objects have not displaced significantly (that is, the

algorithm works best with slow-moving objects)

— The images depict a natural scene containing textured objects exhibiting

shades of gray (different intensity levels) which change smoothly.

The algorithm does not need to use color information It does not scan the

second image looking for a match for a given pixel It works by trying to guess in

which direction an object has moved so that local changes in intensity can be

explained Of course, a simple pixel does not usually contain enough ”structure”

useful for matching with another pixel It is better to use a neighborhood of

pixels, for example, the 3 x 3 neighborhood around the pixel (x, y)

Ix(x + Ax, y + Ay) - u + ly(x + Ax, y + Ay) - v = It(x + Ax, y + Ay) for Ax =

where S is a9 x 2 matrix containing the rows (1 (z+ Az, y+ Ay), Tứ 3

Az, tự + Ay)) and ¢ is a vector containing the 9 terms —J,(#+ Ax, y+ Ay).

The above equation cannot be solved exactly (in the general case) The Least

Squares solution is found by multiplying the equation by ®†

s*s( )=s" t

ủ

and inverting 575, so that

Trang 30

At first, we can think that this does not sound like Optical flow at all but under

the hood, this is just a smarter way to actually implement the algorithm the idea

behind it is the same.

We can say that the Lucas-Kanade algorithm makes a "best guess” of the

dis-placement of a neighborhood by looking at changes in pixel intensity which can be

explained by the known intensity gradients of the image in that neighborhood For a simple pixel, we have two unknowns (u and v) and one equation (that is, the system

is underdetermined) We need a neighborhood in order to get more equations

Do-ing so makes the system overdetermined and we have to find a least squares

solution The LSQ solution averages the optical flow guesses over a neighborhood

The Lucas-Kanade algorithm is an efficient method for obtaining optical flowinformation at interesting points in an image (i.e those exhibiting enough

intensity gradient information) It works for moderate object speeds

3.2 Motion Magnification (2005)

In this first paper which can arguably be the start of this whole branch of study, the

author first looks at the problem as a way to track not one but all of the pixel

movement in the video frame and then follow them through the temporal change of

the video By doing so, they can capture small motion in the frame and then

magnify it, reviewing deformations that would otherwise be invisible to human eyes.

Figure 3.2: Mô ta sample set (training set)

Trang 31

The way the paper deal with moving object is to see them as a group of

commonly segmented pixels based on the similarity of position, color, and

motion After track-ing all of the pixels and their trajectories, the paper will group

them into smaller cluster groups based on their Cluster feature This approach was a novel idea since no one had ever thought of this problem before,

Once the images have been registered, we find and track feature points for a

sec-ond time The goal of this feature tracking is to find the trajectories of a reliable set of

feature points to represent the motions in the video As before, the steps consist of feature point detection, SSD matching, and local Lucas-Kanade refinement.

t frame index t: frame index

Figure 3.3: Learned regions of support allow features (a) and (b) to reliably track the

leaf and background, respectively, despite partial occlusions For feature (b) on the

stationary background, the plots show the x (left) and y (right) coordinates of the track

both with (red) and without (blue) a learned region of support for appearance comparisons The track using a learned region of support is constant, as desired for

feature point on the stationary background

After segmenting all the layers and determining the background, The user

will select the motion layer for which the motion is to be magnified, and thedisplace-ments of each pixel in the cluster are multiplied by a selected factorusually between 4 and 40 in the study

the claim is that this technique can act like a microscope for visual motion Itcan amplify subtle motions in a video sequence, allowing for visualization of

Trang 32

deformations that would otherwise be invisible To achieve motion magnification,

we need to accurately measure visual motions, and group the pixels to be modified.After an initial image registration step, we measure motion by a robust analysis offeature point trajectories, and segment pixels based on the similarity of position,color, and motion A novel measure of motion similarity groups even very smallmotions according to correlation over time, which often relates to a physical cause

An outlier mask marks observations not explained by our layered motion model,

and those pixels are simply reproduced on the output from the original registered

observations The motion of any selected layer may be magnified by a specified amount; texture synthesis fills-in unseen “holes” revealed by the amplified

user-motions The resulting motion-magnified images can reveal or emphasize small

motions in the original sequence, as we demonstrate with deformations in bearing structures, subtle motions or balancing corrections of people, and “rigid”structures bending under hand pressure

Tiêu đề	Ước lượng tần số chuyển động của vật thể dựa trên khuếch đại video và ứng dụng
Tác giả	Phan Thanh Nhan
Người hướng dẫn	DR NGUYEN VINH TIEP
Trường học	University Of Information And Technology
Chuyên ngành	Computer Science
Thể loại	Graduation Project
Năm xuất bản	2022
Thành phố	Hồ Chí Minh

Định dạng
Số trang	65
Dung lượng	39,02 MB