HỒ CHÍ MINH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAMTRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc CÔNG NGHỆ THÔNG TIN ĐĂNG KÝ ĐÈ TÀI KHÓA LUẬN TÓT NGHIỆP Tên đề tài: Ước lượng tần số chuyển động
Trang 1VIETNAM NATIONAL UNIVERSITY — HO CHI MINH CAMPUS
UNIVERSITY OF INFORMATION AND TECHNOLOGY
FACULTY OF COMPUTER SCIENCE
Object’s vibration frequency estimation
using video magnification and applications
PHAN THANH NHAN - 19521944
INTRUCTOR :
DR NGUYEN VINH TIẾP
Trang 2DANH SÁCH HỘI ĐÒNG BẢO VỆ KHÓA LUẬN
Hội đồng cham khóa luận tốt nghiệp, thành lập theo Quyết định số
ngày của Hiệu trưởng Trường Đại học Công nghệ Thông tin.
1 TS Lê Minh Hưng — Chủ tịch.
2 Ths Nguyễn Thị Ngọc Diễm — Thư ký.
3 Ths Đỗ Văn Tiến — Ủy viên.
Trang 4ĐẠI HỌC QUỐC GIA TP HỒ CHÍ MINH CỘNG HÒA XÃ HỘI CHỦ NGHĨA VIỆT NAM
TRƯỜNG ĐẠI HỌC Độc Lập - Tự Do - Hạnh Phúc
CÔNG NGHỆ THÔNG TIN
ĐĂNG KÝ ĐÈ TÀI KHÓA LUẬN TÓT NGHIỆP
Tên đề tài:
Ước lượng tần số chuyển động của vật thể dựa trên khuếch
đại video và ứng dụng
Tên đề tài tiếng Anh:
Object’s vibration frequency estimation using video
magnification and applications
Ngôn ngữ thực hiện: Tiếng Anh
Cán bộ hướng dẫn: TS Nguyễn Vinh Tiệp
Thời gian thực hiện:Từ ngày 5/09/2022 đến ngày 24/12/2022.
Sinh viên thực hiện:
Phan Thành Nhân - 19521944 Lớp:KHCL2019.3
Email:19521944@gm.uit.edu.vn Điện thoại:0918095450
Nội dung đề tài:(Mô tả chi tiết mục tiêu, phạm vi, đối tượng, phương pháp thực hiện,
kết quả mong đợi của đề tài)
Giới thiệu đề tài :
“If you want to find the secrets of the universe, think in terms of energy, frequency and
vibration.” Nikola Tesla Nhà bác hoc Nikola Tesla từng có một câu nói : “nếu bạn muốn tim thấy được bí mat
của vũ trụ , hãy suy nghĩ về năng lượng , tần số và sự rung động “
Trang 5Như chúng ta đều biết, vạn vật trong vũ trụ này đều được hình thành từ các hạt vật
chất , và chúng đều sẽ xảy ra giao động khi gặp đúng tần số cộng hưởng của nó Từ
những con ốc bị lỏng trong máy móc công nghiệp, cho đến các động mạch đập dưới da đều liên tục sinh ra các giao động có tần số ổn định Tuy nhiên đó là những chuyển
động mà mắt thường con người không thể nhìn thấy được, bởi sự giao động của chúng
quá nhỏ Tuy nhiên, các chuyển động này vẫn có thể được phần nào ghi lại bởi các máy
ảnh điện tử và được khuếch đại bởi các thuật toán cũng như các mô hình học sâu.
Từ đây có thể giúp chúng ta quan sát được các hiện tượng mà trước đó không thể
quan sát được Đồng thời ước lượng được tần số giao động của vật thể và biến chúng
thành những thông tin hữu ích và trực quan hơn cho con người có thể tham khảo.
Đề tài này nhắm đến việc tạo ra một ứng dụng với input là một video có bao gồm một
đối tượng đang giao động, output sẽ là một video đã khuếch đại các chuyển động nói
trên và tần số giao động ước lượng được Từ đó có thể sử dụng tần số ước lượng được
cho các ứng dụng ví dụ như : do nhịp tim cũng như phục hồi lại những âm thanh từ đó
Với những nghiên cứu sơ bộ, em nhận thấy đã có rất nhiều hướng tiếp cận cho bài
toán frequency estimation, tuy nhiên trong phạm vi bài toán này, em xin tìm hiểu dạng
bài toán video magnification với hướng tiếp cận đến từ các nhà nghiên cứu thuộc đại
học MIT với các bài toán nổi bật như Learning-based Video Motion Magnification
khuếch đại chuyển động trong video giúp quá trình ước lượng tần số của chuyển động
được dễ dàng hơn.
Mục tiêu đề tài :
- Hiéu được dạng bài toán khuếch đại chuyên động cũng như bài toán frequency
estimation
- Ứng dụng phương pháp vào các input tự thu thập được trong thực tế
- Minh họa trực quan phương pháp và ứng dụng thực tiễn trong nhiều lĩnh vực, ví
dụ như công nghiệp và y khoa.
Nội dung nghiên cứu của đề tài : Nội dung 1 : tìm hiểu về quy trình của phương pháp
- Khảo sát và tổng hợp tài liệu liên quan đến các công nghệ cũng như các kỹ thuật
được sử dụng trong các bài báo : Motion manipulation - motion magnification,
deep convolutional neural network ,visual acoustics Qua đó tổng quát hóa quy
trình của phương pháp.
- Chay thử các model và dataset được cung cấp sẵn và đánh giá.
- Tỉnh chỉnh cũng như update các code lỗi thời thành một phiên bản phù hợp hơn.
- Kết qua dự kiến : báo cáo kết quả chạy thử va tng hợp tai liệu kỹ thuật chỉ tiết về
phương pháp
Trang 6Nội dung 2 : Xây dựng ứng dụng minh họa và cái test case thực tế
- _ thực hiện quay các video về các hiện tượng thực tế như máy móc hoặc mạch đập
trên tay
- Biên tập các đoạn video để phù hợp nhất với ứng dụng.
- Xây dựng demo UI
+ Kết qua dự kiến : ứng dung demo và đánh giá hiệu năng
Tài liệu tham khảo
- Eulerian Video Magnification for Revealing Subtle Changes in the World ,
- Learning-based Video Motion Magnification MIT CSAIL,
Cambridge, MA, USA
http://people.csail.mit.edu/mrub/vidmag/papers/deepmag.pdf
Kế hoạch thực hiện:
+Giai đoạn 1 (5/9/2022 đến 15/10/2022 ) : Nghiên cứu bài báo,tìm hiểu công nghệ
được sử dụng, chạy mô hình và dataset bài toán sử dụng , ghi chép lại các thông số
chạy model
+ Giai đoạn 2 (15/10/2022 đến 20/11/2022 ) : Sử dụng bộ data khác để thử nghiệm
model được chạy ở giai đoạn 1, ghi chép các thông số và đánh giá kết quả, phân tích và chỉ ra các điểm ảnh hướng tới kết quả dự đoán Để từ đây tạo thành các video test case phù hợp nhất
+ Giai đoạn 3 (20/11/2022 đến 20/12/2022) : xây dựng ứng dụng minh họa và tìm
hướng cải thiện kết quả dự đoán.
+ Giai đoạn 4 (20/12/2022 đến khi báo cáo ): Tìm hướng cải thiện kết quả dự đoán,
viết file bao cao , chuẩn bị slide bảo vệ khỏa luận.
Xác nhận của CBHD TP HCM, ngày tháng năm 2022
(Ký tên và ghi rõ họ tên) Sinh viên
(Ký tên và ghi rõ họ tên)
Trang 8| would like to begin this thesis by acknowledging all of whom | owe this complishment We can not be the place we are today without all the unwavering support and guidance that was given to us.
ac-To my instructor Dr.Nguyen Vinh Tiep, his guide has become a shining con for my journey His vast wealth of knowledge has given me countless ideas
bea-and provided us with state-of-the-art equipment for my research, pointing out
my inaccuracies and giving us priceless instructions | would like to show my utmost gratitude to Dr.Tiep for his contributions.
To all the teachers and professors in the computer science department, | would like to send my appreciation for all of the clear directions that were given when | was on my way to completing this thesis.
To our friends at the MMLab, who has been nothing but the greatest group
of friend anyone could ever ask for Without all the help and feedback | received
my thesis could not be as complete as it is today.
furthermore, | would like to acknowledge my family to whom | owe everything My achievement could never come to fruition without all the love and
sacrifices that my family has made.
Trang 91 Abstract
2 Introduction
2.1 Overview
2.2 Motivation
2.3 Applicability
2.3.1 Applicability in medical
2.3.2 Applicability in industry
2.3.3 Applicability in astrology
2.3.4 Sound retrieval from still video
2.4 Challenges and solutions
2.4.1 Challenges
Random noise Ringing artifact
2.4.2 Solutions
Setup solutions
Algorithm solutions
2.5 the main goal
2.6 Contributions
3 Background 3.1
OpticalfloW -3.2 Motion Magnification (2005)
3.3 Eulerian Video Magnification [Wu et al 2012]
3.4 Phase-Based Video Motion Processing
17 18 18 19 19 20
21
21 23 26
30
Trang 10Contents 3
4 Related Works 36
4.1 Learning-based Video Motion Magnification 36
4.1.1 Introdution to Learning-based Motion Magnification 39
Deep Convolutional Neural Network Architecture 40
4.1.2 Synthetic Training Dataset 40
4.1.3 pixel Intensity signal tracking 43
5 Proposed method 45 5.0.1 Dataset 45
5.0.2 Dataset Collection 46
5.0.3 Object’s vibration frequency estimation using video magnifi-cation and applimagnifi-cations 48
6 Results and Evaluations 51 6.0.1 Conclution 54
6.0.2 future work_ 56
Trang 11List of Figures
2.1 Input and output using video magnification + frequency estimation
2.2 Example: the pulse frequency output using video magnification +
frequency estimation
2.3 this the result of a video magnification + frequency estimation that
was proven to be reliable comparing to a real medical machine) 2.4 Example of a machine that can be track using camera
2.5 Example of the sky full with stars 0.
2.6 A mock setup that can be used to test this application
2.7 Example of ringing artifact (a) is a sharp image (b) Is a picture with
ringing artifact
2.8 The set up that was used in the thesis to collect data
3.1 Mô tả sample set (training set)
3.2 Mô tả sample set (training set) c.
3.3 Learned regions of support allow features (a) and (b) to reliably track
the leaf and background, respectively, despite partial occlusions For feature (b) on the stationary background, the plots show the x (left)
and y (right) coordinates of the track both with (red) and without
(blue) a learned region of support for appearance comparisons The track using a learned region of support is constant, as desired for feature point on the stationary background
3.4 The input and output frame that show the deform after magnification
10
13 14 15 16
18 19
21 23
24 25
Trang 123.10 Comparison of result on a common sequence
An example of using our Eulerian Video Magnification framework for
visualizing the human pulse (a) Four frames from the original video
sequence (face) (b) The same four frames with the subject’s pulse
signal amplified (c) A vertical scan line from the input (top)
and output (bottom) videos plotted over time shows how our method amplifies the periodic color variation In the input sequence the signal
is imperceptible, but in the magnified sequence the variation is clear.
The complete sequence is available in the supplemental video 26
Overview of the Eulerian video magnification framework 27
relationship between temporal processing and motion magnification 29 the diagram of the phase-based approach manipulates motion 31
A big world of small motions Representative frames from videos
in which we amplify imperceptible motions The full sequences and results are available in the supplemental video 32
[2012] and our approach for motion magnification The representation
size is given as a factor of the original frame size, where k represents the number of orientation bands and n represents the number of filters per octave for each orientation 35
Detailed diagram for each part denotes a convolutional layer of c channels, k x k kernel size, and stride s 37 Overview of the architecture Our network consists of 3 main parts:
the encoder, the manipulator, and the decoder During training, the inputs to the network are two video frames, (Xa, Xb), with a magni- fication factor a, and the output is the magnified frame Y* 40 Picture from MS COCO dataset as the background 41 Picture from segmented objects from the PASCAL VOC dataset 42 Applying our network in 2-frame settings We compare our network
applied in dynamic mode to acceleration magnification Because is based on the complex steerable pyramid, their result suffers from
ringing artifacts and blurring 43
Example of the result of intensity signal tracking 44
Trang 13List of Figures 6
4.7 Example of the result of intensity signal tracking on human face 44
5.1 Testing dataset that was collected 46 5.2 How the testing dataset was collected 46 5.3 the testing data that was collected are sharp and usable 47 5.4 the testing data that was collected are sharp and usable 47 5.5 showing the detail get lost after the process 49 5.6 Our improvement by adding stoping and anchor point 49
6.1 This is one of the test object that we used - a box fan 51 6.2 This is one of the test objects that we used - a pet mouse sleeping 52 6.3 This is the output showing the improvement 53
6.4 This is the original 55
6.5 This is the original 55
6.6 This is the oufpul 55
Trang 14Chapter 1
Abstract
Motion tracking and frequency estimation are one of the most famous and
funda-mental problems for computer vision study in general Using algorithms to track the
motion of objects, We can estimate the frequency of their motion But most of the research for this field of study only focuses on visually intensive motion like moving cars or humans walking because it can easily be tracked and monitored, this also means that objects that can not be easily tracked like a slightly shaking machine or
a flower moving in the wind are hard to work with and motion frequency can not be estimated because it only happens on a small pixel space hence getting the name small or "hidden" motion To combat this problem most research aims to use
expensive high-resolution cameras to zoom into the object to monitor it or just
ignored all object that is too small completely Since cameras are getting better and
better and cheaper over time, this was not really a problem Because of this
algorithm base approaches are less explored and researched.
But if there is a way for us to see these vibrations, the applications that they can
bring us are endless And that is the reason why a new brand of study was created called motion magnification, With the help of algorithms and new advances in
machine learning, a computer can now pick up extremely small pixel vibrations or
changes of value in a video frame and magnified it or in another word amplify the
vibrations signal, resulting in an output video with all the temporal vibrating motion
magnified helping human to see the previously "hidden" motion with our own two
eyes these motions can then be tracked and monitored giving us the estimated frequency that the object is resonating This can open new possible usage for this technology in many different fields of work.
This thesis will dive deeply into all of the studies mentioned with the main
Trang 15Chapter 1 Abstract 8
goal is to find the state-of-the-art approach which is Learning-based Video Motion
Magnification[5] and using it to build an Object’s vibration frequency estimation
system We will go through path of the paper by running the pre-train model and all the data that the authors provided, helping us to see and understand all the inner
workings of the approach Collect and build a testing dataset for real-world
applications since all of the data the author provided is too old, low resolution, and lacks in test cases After that, we will fine-tune, upgrade some of the obsolete parts,
improve some of the drawbacks of the model to better fit our solution and check the
result of the test data we made Finally, implement all of the improvements that we proposed to the system that we build and evaluate the result.
keyword: Motion magnification, Frequency estimation
Trang 16Chapter 2
Introduction
Trang 17Chapter 2 Introduction 10
21 Overview
Any and every movement can be the result of one or more motion waves hitting the surface of an object making it resonate and vibrate in a frequency These motions only happen on a small area of pixels and the change can sometimes only be less than 10 pixels of difference from frame to frame hence consider too small to track.
But these small vibrations can be picked up by a camera and amplified using a
mathematics algorithm or in this case, a deep convolutional neural network (CNN) call learning base motion magnification[5] filer out all of the small motion in the
video and magnifies it to a desired degree and return a video with altered frames to
better visualize the deformations or vibrations in it, the same process will also capture and estimate the frequency that the object is vibrating in.
Input: A video that has some small vibration with a still background ( recorded by a standard commercial camera or a smartphone).
Output: The same video with altered frames that show the vibration more clearly and visually detectable + estimated frequency at which the object is vibrating.
inararery 8€ ”
(a) Input (wrist) * (b) Motion-amplified
Figure 2.1: Input and output using video magnification +
fre-quency estimation
Time
Figure 2.2: Example: the pulse frequency output using video
mag-nification + frequency estimation
Trang 18Chapter 2 Introduction 11
2.2 Motivation
After doing a deep dive into the subject of frequency estimation, there have been many studies that have tried to solve this problem such as, but all of these studies only focus on the original input and try to estimate the frequency on their own Most
of the time they only work on clear and visible object motions and consider all other
small motions to be too small to detect and ignored completely By using a motion
magnification layer, we can now intensify the motion to a more noticeable degree hence opening the scope of the topic that previous studies have not considered |
believe there is ground for some improvement so this thesis chose to explore that.
Everything in this universe is vibrating, oscillating, and resonating, at various frequencies, it might seem motionless yet every particle that made up these object are
in constant motion every object has its own vibration frequency, at a certain frequency
objects resonate and create a vibration motion this is a natural phe-nomenon that we
can observe in real life like a car engine vibrating or a flower moving ever so slightly in the wind But most of the time these motion is too small for human eyes to actually see
them happen Hence we often ignore it and move on with our daily life For many
decade human has tried to measure the vibration of an object with many different tools, from using gyroscopes to microscopic sensors Most recently using camera base
sensors to estimate the frequency of moving objects but this method is still not
developed enough to reliably estimate small motions.
Since the beginning of this field of research, the heavy focus was to amplifying subtle changes in the signal of every pixel It can be the change in the color value or movement of a group of pixels during a period of time Previous attempts analyze and amplify subtle motions and visualize deformations in video sequences These approaches follow a Lagrangian[6] perspective, in reference to fluid dynamics where the trajectory of particles is tracked over time in this case instead of fluid particles the study sees every pixel as an individual particle Capture and amplified small motions in
a video by identifying pixels in a frame and tracking it through the temporal change of every frame to achieve motion magnification this can amplify subtle motions in a video sequence, allowing for visualization of deformations that would otherwise be invisible But this also opens the floodgate to noise and artifacts polluting the final output And when it comes to the algorithm complexity, the computing power required is immense hence making the approach highly inflexible,
Trang 19methods proving it has greater potential for more accurate applications.
2.3 Applicability
In this day of age, any kind of data can be processed and give us back valuable infor-mation about our world and help us to better understand it and situated ourselves in a more advanced position.
With the ability to estimate the vibration frequency of an object,the study can be
used as a measurement aid to help users better visualize their surrounding
environ-ment and made better adjustenviron-ments This study can be used in multiple industries and its potential of it can be endless some of the real-world use cases are :
others measurement equipment for more vital signals This can be a problem since
a standard heart rate monitor (HRM) or Electrocardiogram (EKG/ECG) device can cost anywhere between 110 to 188 US dollars according to MDSAVE.COM [4] a leading healthcare equipment provider base in San Francisco, USA to equip a
room of 10 patients it can cost the least 1100 dollars or around 27,5 million VND With the ability to track, and magnified the change in skin color of the patient, the
deep convolutional neural networks can return a reliable estimation of heart
Trang 20child’s body this call for a complex workaround like an electronic mat that
measures the heartbeat or small microphone put close to the baby mouth all of these solutions are invasive and not very scalable but with a motion magnification camera, we can provide a non-invasive and reliable method.
We can all remember the shortage of medical equipment during the covid 19
pandemic, This new tech can play a vital role in future health emergencies since
it can be turned into an app and put online for every on to use on their phone,
Opening the chance for wider medical application.
Figure 2.3: this the result of a video magnification + frequency estimation that was
proven to be reliable comparing to a real medical machine)
Trang 21Chapter 2 Introduction 14
2.3.2 Applicability in industry
Vibration in every industry is a factor that is closely monitored Any moving
ma-chine or moving part in a mama-chine has the potential to malfunction and move in
a way that damages itself and the whole machine a loose screw can make a
ma-chine shake during operation creating unnecessary wear and tear and
shortening the lifetime of the equipment
For example, in a running motor, one of its base screws loosens during along time of operation resulting in a prolonged period of small but intensify
vibration Over a period of time, the screw will get looser and looser finally
breaking the base of the pump because of violent vibration This is one of themost common ways to lose a valuable machine just because of a loose screw
To combat this problem many companies chose to put motion censers ontheir machines to monitor and track the vibration of the moving part But then
the problem will be how to put millions of censers on millions of moving parts in
a factory
Once again monitoring these kinds of vibrations requires a trained technician with
an expensive measurement tool the ability for a company to just buy software and a good security camera to monitor all of their equipment is an industry-changing solution that can be considered groundbreaking No need for expensive monitoring censers or
putting a human in hazardous environments, a camera can stream the live feed to the main control room where everything is processed, truly 4.0 style.
Figure 2.4: Example of a machine that can be track using camera
Trang 22Chapter 2 Introduction 15
2.3.3 Applicability in astrology
with powerful telescopes, scientists can track stars and planet moving in orbitlight years away, Based on the frequency of light coming from the star, they can
learn about the composition of a planet and where there it have any moon
orbiting it For a long time, astrologists have all way pioneered state-of-the-art
solutions to solve this very problem, they are no strangers to this field of study and are also one of the most productive contributors to math and algorithm
development With the establishment of artificial intelligence/machine learningsuch as this study on a Learning-based Video Motion Magnification, a promisingfuture is right ahead Based on the groundwork of Astrophysics can study andbetter understand our universe
just by magnifying the motion of a start you can see its spin direction of it, ucan estimate its speed, and how many moons it has the application is endless
Trang 23Chapter 2 Introduction 16
2.3.4 Sound retrieval from still video
The reason why we call it "gimmicks" is that there are not a lot of real words
used for this kind of application, except for highly specific situations
In principle, sound waves move through the air and hit the surface of an
object making it resonate at a corresponding frequency Since the Motion
Magnification CNN has the ability to estimate the vibration frequency of an
object, we can in theory capture the sound frequency of the original sound wave
and then used a Fourier transform step to retrieve the actual sound
Trang 24like what we have on our phone can bearly pick up any motion if it is out of its
focus range also the problem of random noise artifacts and border ringing
artifacts These two types of artifacts contaminated almost all input videos and
will be magnified in the output video
Past approaches use complex hand-crafted filters which can not distinguish what is noise and what is not, all of them are "pixel changes" in the eyes of these filters.
Random noise
A handheld camera record picture or video using a system of lens andelectronic sensor to capture the light and turn it into digital bit form this processhas a load of uncontrollable variations like static, defects on the lens, or randomlight reflection inside of the camera All of this is made up of small and randomnoises on the recorded video there is all most no way to block these kinds of
random noise in real life but only to reduce it to an acceptable level
a handcrafted filter can magnify the motion signal but can also magnify the noise too since we all know that it doesn’t really know that is noise and what is motion.
Ringing artifact
In digital image processing, ringing artifacts are artifacts that appear randomlynear sharp transitions in a signal Visually, they appear as bands or "ghosts"
near the edges The term "ringing" is because the output signal oscillates at a
fading rate around a sharp transition in the input, similar to a bell after being
struck As with other artifacts, their minimization is a criterion in filter design
This is also part of physic, and will all way be there
Trang 25Chapter 2 Introduction 18
Figure 2.7: Example of ringing artifact (a) is a sharp image (b) Is
a picture with ringing artifact
2.4.2 Solutions
Setup solutions
As we previously stated above, these kinds of static or noise can not beeliminated completely since the nature of the problem is based on random
environmental values and micro defects in the making of the camera We
however can reduce it to a small and acceptable range by doing these steps:
Stabilize the camera: Putting the camera on a tripod camera stand can help reduce vibration and help us position the camera at an optimal viewing angle.
Lighting: a constant and bright light source can help illuminate the objectand reduce the blend between it and the background remove shadow and help
us better identify all the features of the object
Background: a good background can make all the difference since themodel was trained to pick out the change between the background and a
moving object so a clear background will greatly increase the result.
A quality camera: the better the camera, the better the output That is just a
simple fact, as better equipment allows us to record in greater resolution andbetter detail but remember all cameras now a day have some sort ofstabilization setting, for our study, turn this feature off
Trang 26Chapter 2 Introduction 19
Algorithm solutions
With the new Learning-based Video Motion Magnification, the algorithm cannow understand what is and is not motion and give back a better result than
ever before reduce ringing artifacts, and less random noise, overall we can see
that this output is sharper and more visually pleasing This will play a greaterrole in getting the best frequency estimation
2.5 the main goal
throughout this thesis, our main goal is to research all there is that we can learnfrom this field of study from the knowledge that was gathered, we aim to replicate
the result of the authors and see how well it operates on our own test data After
the primary goal, we will also try to test all of the real-world usages we envision
After that, implement improvements that we think up and test the result
Trang 27+ Input, output component is updated to python
+ Config all of the legacy library so that it can run on a newer version of Python
3 Create a lot of testing data that can be used by later researchers After doing
some tests on the code and data which we were provided, we realize that the
data for visualization testing or testing, in general, is very limited, most peoplejust use the same old quality video over and over again We understand that
using the same video can help everyone have the same ground truth, but it
would be quite boring to see the same example over and over again
4 Propose some new ways to modify the model and the input to make the
result better and help with the model giving out degrading result.
Trang 28comparing the two continuous images.
At this point in time Lucas-Kanade’s optical flow algorithm is one of the best
to track motion it works by laying down some rules to follow:
Trang 29Chapter 3 Background 22
- The †wo images are in a time series and it is different by a small time increment
A t, in such a the way that objects have not displaced significantly (that is, the
algorithm works best with slow-moving objects)
— The images depict a natural scene containing textured objects exhibiting
shades of gray (different intensity levels) which change smoothly.
The algorithm does not need to use color information It does not scan the
second image looking for a match for a given pixel It works by trying to guess in
which direction an object has moved so that local changes in intensity can be
explained Of course, a simple pixel does not usually contain enough ”structure”
useful for matching with another pixel It is better to use a neighborhood of
pixels, for example, the 3 x 3 neighborhood around the pixel (x, y)
Ix(x + Ax, y + Ay) - u + ly(x + Ax, y + Ay) - v = It(x + Ax, y + Ay) for Ax =
where S is a9 x 2 matrix containing the rows (1 (z+ Az, y+ Ay), Tứ 3
Az, tự + Ay)) and ¢ is a vector containing the 9 terms —J,(#+ Ax, y+ Ay).
The above equation cannot be solved exactly (in the general case) The Least
Squares solution is found by multiplying the equation by ®†
s*s( )=s" t
ủ
and inverting 575, so that
Trang 30Chapter 3 Background 23
At first, we can think that this does not sound like Optical flow at all but under
the hood, this is just a smarter way to actually implement the algorithm the idea
behind it is the same.
We can say that the Lucas-Kanade algorithm makes a "best guess” of the
dis-placement of a neighborhood by looking at changes in pixel intensity which can be
explained by the known intensity gradients of the image in that neighborhood For a simple pixel, we have two unknowns (u and v) and one equation (that is, the system
is underdetermined) We need a neighborhood in order to get more equations
Do-ing so makes the system overdetermined and we have to find a least squares
solution The LSQ solution averages the optical flow guesses over a neighborhood
The Lucas-Kanade algorithm is an efficient method for obtaining optical flowinformation at interesting points in an image (i.e those exhibiting enough
intensity gradient information) It works for moderate object speeds
3.2 Motion Magnification (2005)
In this first paper which can arguably be the start of this whole branch of study, the
author first looks at the problem as a way to track not one but all of the pixel
movement in the video frame and then follow them through the temporal change of
the video By doing so, they can capture small motion in the frame and then
magnify it, reviewing deformations that would otherwise be invisible to human eyes.
Figure 3.2: Mô ta sample set (training set)
Trang 31Chapter 3 Background 24
The way the paper deal with moving object is to see them as a group of
commonly segmented pixels based on the similarity of position, color, and
motion After track-ing all of the pixels and their trajectories, the paper will group
them into smaller cluster groups based on their Cluster feature This approach was a novel idea since no one had ever thought of this problem before,
Once the images have been registered, we find and track feature points for a
sec-ond time The goal of this feature tracking is to find the trajectories of a reliable set of
feature points to represent the motions in the video As before, the steps consist of feature point detection, SSD matching, and local Lucas-Kanade refinement.
t frame index t: frame index
Figure 3.3: Learned regions of support allow features (a) and (b) to reliably track the
leaf and background, respectively, despite partial occlusions For feature (b) on the
stationary background, the plots show the x (left) and y (right) coordinates of the track
both with (red) and without (blue) a learned region of support for appearance comparisons The track using a learned region of support is constant, as desired for
feature point on the stationary background
After segmenting all the layers and determining the background, The user
will select the motion layer for which the motion is to be magnified, and thedisplace-ments of each pixel in the cluster are multiplied by a selected factorusually between 4 and 40 in the study
the claim is that this technique can act like a microscope for visual motion Itcan amplify subtle motions in a video sequence, allowing for visualization of
Trang 32Chapter 3 Background 25
deformations that would otherwise be invisible To achieve motion magnification,
we need to accurately measure visual motions, and group the pixels to be modified.After an initial image registration step, we measure motion by a robust analysis offeature point trajectories, and segment pixels based on the similarity of position,color, and motion A novel measure of motion similarity groups even very smallmotions according to correlation over time, which often relates to a physical cause
An outlier mask marks observations not explained by our layered motion model,
and those pixels are simply reproduced on the output from the original registered
observations The motion of any selected layer may be magnified by a specified amount; texture synthesis fills-in unseen “holes” revealed by the amplified
user-motions The resulting motion-magnified images can reveal or emphasize small
motions in the original sequence, as we demonstrate with deformations in bearing structures, subtle motions or balancing corrections of people, and “rigid”structures bending under hand pressure