5.1. Achievement In the thesis, we successfully present a novel method which combines different algorithms to extract the cell features from the high spatiotemporal videos. The extracted features are divided into two types. Firstly, shape features contains area, perimeter and the deformability of target cells. Secondly, the speed features contains the translation speed and the rotation speed of cells in data. Despite some remained limitations , especially irregular shape cell, we figure out several proper solutions to extract near accuracy features. The method has not only worked on the cell data, but also able to extract similar features from other sources with the same attributes. 5.2. Future work The biomechanical features that we extracted from these biological data can be used as an input for a cell classifier machine learning system. Developers can input their cell data and extract the biomechanical features. Then, they apply it into a machine learning model, further explore and study the cell data. The results of our thesis are just the initial achievements. There are still some drawbacks on both data and method need to improve. First, the quality of biological cell data plays a big role in determining the accuracy of analyzed features. We need more clear cell data, without losing any valuable pixel. Second, despite of the accurate quantitative features, the method is still lacking in dealing with irregular and micro cell data. In the future, more researches and analysis can be done to find an optimal solution for this problem. Tóm tắt: Nghiên cứu liên ngành đang là một trong các hướng phát triển chính trong những năm gần đây. Trong đó, nổi bật là tính toán sinh học, bao gồm rất nhiều khía cạnh trong tin sinh, là một tổ hợp nghiên cứu kết hợp tính toán, phân tích và khoa học máy tính để giải quyết những bài toán gốc sinh học. Ngày nay, nhờ những phát minh hiện đại, chúng ta đã có thể trích xuất được rất nhiều dữ liệu sinh học, nhanh hơn cả khả năng xử lý và phân tích bọn chúng. Ngày càng nhiều nhu cầu phát triển những phương pháp, phần mềm hỗ trợ phân tích và trích xuất giá trị định lượng sinh học đúng đắn này. Một trong những dữ liệu sinh học cơ bản là tế bào sinh học. Việc nghiên cứu và khai thác dữ liệu tế bào sinh học luôn là một bài toán phức tạp và khó khăn. Dự án của chúng tôi thực hiện nghiên cứu phân loại tế bào sinh học với mục tiêu tìm được một bước đột phá mới trong phương pháp phân loại tế bào. Để tìm hiểu sâu hơn, trước tiên chúng ta cần tiền xử lý dữ liệu, trích xuất những đặc trưng có thể, chính xác từ dữ liệu tế bào sinh học. Đồ án dưới đây sẽ giới thiệu một phương pháp trích xuất đặc trưng cơ sinh học mới của tế bào sinh học. Tuy phương pháp đơn giản và còn vài hạn chế, nhưng vẫn đưa ra những kết quả đặc trưng chính xác có đánh giá. Từ những dữ liệu đặc trưng này, chúng ta giúp bản thân hiểu sâu hơn về những tế bào sinh học trong dự án.
VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY Cell's biomechanical features extraction from very high spatio-temporal videos Major: Computer Science AUTHORSHIP “I hereby declare that the work contained in this thesis is of my own and has not been previously submitted for a degree or diploma at this or any other higher education institution To the best of my knowledge and belief, the thesis contains no materials previously published or written by another person except where due reference or acknowledgement is made.” Signature:……………………………………………… SUPERVISOR’S APPROVAL “I hereby approve that the thesis in its current form is ready for committee examination as a requirement for the Bachelor of Computer Science degree at the University of Engineering and Technology.” Signature:……………………………………………… ACKNOWLEDGEMENT Firstly, I would like to express my sincere gratitude to my supervisor Assoc Prof Le Thanh Ha of University of Engineering and Technology, Vietnam National University, Ha Noi for their instructions, guidance and their research experiences Secondly, I am grateful to thank my co-supervisor M.S Tran Si Hoai Trung of Division of Solid State Physics and Nano Lund, Department of Physics, Lund University, Sweden for invaluable assistance and knowledge during our working time Moreover, I am grateful to thank all the teachers of University of Engineering and Technology, VNU for their invaluable lessons which I have learnt during my university life I would like to also thank my friends in K59CA class, University of Engineering and Technology, VNU I greatly appreciate the helps and support from Human Machine Interaction Laboratory of University of Engineering and Technology during this project ABSTRACT Abstract: Interdisciplinary research has been a primary study in the recent years Especially the computational biology, which includes many aspects of bioinformatics, is the combined research of math, statistics and computer science to solve biology-based problems Nowadays, thanks to advanced technology, we were able to collect a vast volume of biological data, faster than we can analyze it More and more the need to develop analytical method for interpreting accurate quantitative biological features of these available data One of the basic biological data is cell information To exploit and interpret the data, the process is often complicated and requires a lot of efforts and time consume Our project takes on the research of cell classification with the goal to break a new ground in the cell classifier method For further study, we first aim at the pre-analysis of data, the extraction of possible accurate quantitative features that we can exploit from our data Our thesis will introduce a new cell’s biomechanical features extraction method to interpret biological cell data Despite the simple method and some drawbacks, the conclusion results in evaluative, accuracy features From these biomechanical features extraction, we help to understand and study more about available cell data in the project Keywords: Bioinformatics, Biomechanical features extraction Tóm tắt Tóm tắt: Nghiên cứu liên ngành hướng phát triển năm gần Trong đó, bật tính tốn sinh học, bao gồm nhiều khía cạnh tin sinh, tổ hợp nghiên cứu kết hợp tính tốn, phân tích khoa học máy tính để giải toán gốc sinh học Ngày nay, nhờ phát minh đại, trích xuất nhiều liệu sinh học, nhanh khả xử lý phân tích bọn chúng Ngày nhiều nhu cầu phát triển phương pháp, phần mềm hỗ trợ phân tích trích xuất giá trị định lượng sinh học đắn Một liệu sinh học tế bào sinh học Việc nghiên cứu khai thác liệu tế bào sinh học ln tốn phức tạp khó khăn Dự án chúng tơi thực nghiên cứu phân loại tế bào sinh học với mục tiêu tìm bước đột phá phương pháp phân loại tế bào Để tìm hiểu sâu hơn, trước tiên cần tiền xử lý liệu, trích xuất đặc trưng có thể, xác từ liệu tế bào sinh học Đồ án giới thiệu phương pháp trích xuất đặc trưng sinh học tế bào sinh học Tuy phương pháp đơn giản vài hạn chế, đưa kết đặc trưng xác có đánh giá Từ liệu đặc trưng này, giúp thân hiểu sâu tế bào sinh học dự án Từ khóa: Tin sinh học, Trích xuất đặc trưng sinh học TABLE OF CONTENTS List of Figures List of Tables Chapter INTRODUCTION 1.1 Motivation Nowadays, interdisciplinary research has been a primary focus of the study With the help of modern laboratory technology, biologists were able to collect a huge number of data The process is quicker than biologists can analyze it With the improvement of internet and iClouds, it is possible to share and store data among biology research centres We now have a vast volume of biological data without the mean and technique to interpret it This is when the need to develop newly analytical method for interpreting accurate quantitative features of these available biological data Computational biology, which includes many aspects of bioinformatics, is the combined research of math, statistics and computer science to solve biology-based problems Computational biologists develop and apply software tool, statistical physics and algorithm design to analyze biological data With the help of statistical, mathematical and computational software, biologists can analyze the quantitative prediction and interpretation of available data, explore more sophisticated and highly complex problem 1.2 Contribution and thesis overview 1.2.1 Contribution 10 Figure 4.2 Area and perimeter graph example 4.1.3 Video The video result is also the output video in the interface After complete or cancel a video analyze, the interface will save a video in the name of “FO_video name.avi” The video contains the input cell frame, a bounding box and four variable showing current result features, which are deformability, rotation speed and translation speed of the analyzed cell If the deformability greater than deform threshold (0,8), the bounding box is blue, else it will turn red 35 Figure 4.3 Features extraction video 4.2 Discussions 4.2.1 Evaluation For our shape features, we can not look into the area and perimeter graphs to clearly tell the accuracy of the extracted features However, we can base on the deformability of the cell, which is the result of its area and perimeter, to evaluate the result of the shape features Most round and near round cell start with deformability greater than 0,87, the bounding box were blue But when the cell crosses the panel and hit again the pillar, the cell is pressed and its deform ratio decreases down to 0,78, causing the bounding box turns red However, when the cell pass the pillar, its elastic body returns to the original form, causing the bounding box turns back to blue 36 (a) (b) (c) Figure 4.4 Deformation process evaluation (a) Before deformation process (b) During the deformation process (c) After deformation process For speed features, our method bases on the affine transformation of cell with two types of movement, which are translation and rotation speed Because the inconsistent and variety shape of cell, by reducing the outliers and limiting the deformability to avoid the analysis of bad cells, we have extracted near accuracy speed features To evaluate these speed features, we analyze it base on three categories First, the graph In Figure 4.5, 4.6, 4.7 and 4.8, we have six graphs, shown extracted features of a cell Look at rotation speed, the cell starts to rotate slowly and getting faster at peak on top of the pillars before quickly decrease back to zero pixels per frame Because the air pressure between two pillars is strongest, the cell which passing through it has its rotation and horizontal translation speed at its peak While 37 the vertical translation speed are changing along the up and down of the cell in the video The video contains the view of two pillars and the moving cell travel from the left side to the right side of the videos In Figure 4.5, when the cell appears from the left border of the video Its area and perimeter slowly grow and reach their peak when the cell completely appear The great decrease at the middle of the area come from the cell’s missing part when the cell falls down between two pillars, causing a part of cell disappear This goes the same for perimeter but it reduces lesser because the deformability of the cell does not change much Figure 4.5 Area, perimeter graph evaluation For Figure 4.6, we see the deformability of the cell Because of the incomplete view of cell, the deformability features extracted from the start and end of the cell’s appearance are inaccuracy We can see clearly from the graph, two sections with its 38 deform ratio lower than deformability threshold ( 8,0 (Max) 300 mbar 500 mbar 700 mbar 900 mbar [6.15135726] [8.64949473] [15.95904601] [25.63372186] [6.81898311] [9.63603709] [15.39571649] [32.94198975] [7.66388116] [13.51178852] [15.45696514] [30.277758] [8.04721913] [12.56142627] [12.67210463] [31.48099373] 42 [8.54023425] [13.18442388] [12.27077237] [28.47893959] [8.50366929] [11.78289583] [12.7423041] [30.81571737] [8.43547253] [13.17147526] [12.36726858] [25.92653887] [8.82553278] [13.93926457] [31.81934045] [7.406303] [12.70228072] [24.60319644] [6.26702406] [12.72254161] [13.61157819] [12.62328029] [13.01632335] Table 4.1 Maximum horizontal translation speed with deformability greater than 8,0 Some data, which most the cells are defective and irregular shape result in lower maximum horizontal speed than others because we did not analyze the speed of bad cells Third, the accuracy of rotation center, cell’s center of mass plays a key role in evaluating the correctness of affine transformation formula We can see from Figure 4.9, when the cell starts, the center of rotation was located in the middle of the cell, however, when the cell hit again the pillar, the center of mass shift toward the pillar, showing the cell instead rotating around its original center of mass, it is rotating around the pillar After the cell leaves the pillar, its center of mass returns to original position (a) (b) 43 (c) (d) Figure 4.9 Evaluate cell’s center of mass: (a) Before cell impact (b) On cell impact (c) Cell at pillar peak (d) After cell impact 4.2.2 Drawbacks Although the high accuracy result, the thesis still contains three main drawbacks First, the microcell Despite its small size, the background removal algorithm can still detect the microcell Due to using of morphological transformation, which are dilation and erosion, if the microcell is located near the biological cell in the range of kernel size, the density cell method will connect two cells together This leads to the inaccuracy result in the calculation of cell’s area, perimeter and deformability (a) (b) (c) 44 Figure 4.10 Microcell drawback: (a) Original frame, deform ratio = 0,73 (b) Background removal frame (c) Density cell frame Second, this also applies for multiple biological cells case If two biological cells after background removal are located near each other in the range of kernel size, the density cell method will connect two cells together This leads not just to the inaccuracy result in shape features, but the speed features of the biological cell Because instead explore the speed features of a single cell with one center of mass, we are calculating the speed features of two cells with two centers of mass However, most of this case, the deformability of detected cell is lower than deform ratio threshold (0,8), so we will not record the features result in the output data (a) (b) (c) Figure 4.11 Multiple cells drawback: (a) Original frame, deform ratio = 0,69 (b) Background removal frame (c) Density cell frame The final drawback is the bad cells Due to the use of affine transformation matrix and rotation around center point, our algorithm mainly explores on the analysis of round and near round cell to get accuracy features Bad cells like defective cell and irregular shape cell will result in an inaccuracy analyze To limit this, we also use 45 deform ratio threshold (0,8) to detect distinctive bad cell and stop the recording of features in memory But not all bad cells have low deformability (a) (b) (c) (d) (e) (f) Figure 4.12 Bad cell drawback and its background removal data: (a) (b) Defective cell draw back, deform ratio = 0,76 (c) (d) Irregular cell draw back, deform ratio = 0,71 (e) (f) Irregular cell with high deformability, deform ratio = 0,83 46 Chapter CONCLUSIONS 5.1 Achievement In the thesis, we successfully present a novel method which combines different algorithms to extract the cell features from the high spatio-temporal videos The extracted features are divided into two types Firstly, shape features contains area, perimeter and the deformability of target cells Secondly, the speed features contains the translation speed and the rotation speed of cells in data Despite some remained limitations , especially irregular shape cell, we figure out several proper solutions to extract near accuracy features The method has not only worked on the cell data, but also able to extract similar features from other sources with the same attributes 5.2 Future work The biomechanical features that we extracted from these biological data can be used as an input for a cell classifier machine learning system Developers can input their cell data and extract the biomechanical features Then, they apply it into a machine learning model, further explore and study the cell data The results of our thesis are just the initial achievements There are still some drawbacks on both data and method need to improve First, the quality of biological cell data plays a big role in determining the accuracy of analyzed features We need 47 more clear cell data, without losing any valuable pixel Second, despite of the accurate quantitative features, the method is still lacking in dealing with irregular and micro cell data In the future, more researches and analysis can be done to find an optimal solution for this problem 48 REFERENCES [1] Hervé Abdi The Method of Least Squares Encyclopedia of Measurement and Statistics Thousand Oaks (CA): Sage Neil Salkind (Ed.) (2007) [2] David G Lowe Distinctive Image Features from Scale-Invariant Keypoints International Journal of Computer Vision, Vol 60, Pages 91-110, Issue 2, November 2004 [3] Alexander Ženíšek Green's theorem from the viewpoint of applications Applications of Mathematics, Vol 44 (1999), No 1, 55 80 [4] Background subtraction Retrieved from https://docs.opencv.org/3.1.0/db/d5c/tutorial_py_bg_subtraction.html [5] Linear mapping method using affine transformation Retrieved from https://www.mathworks.com/discovery/affine-transformation.html [6] Morphological Transformations(Erosion and Dilation) Retrieved from https://docs.opencv.org/3.2.0/d9/d61/tutorial_py_morphological_ops.html 49 ... accuracy features From these biomechanical features extraction, we help to understand and study more about available cell data in the project Keywords: Bioinformatics, Biomechanical features extraction. .. the biomechanical features of cell from high spatio-temporal videos We extract total two kinds of features Shape features, which contains area, perimeter and deformability of the cell Speed features, ... the videos Our objective is to extract the cell’s biomechanical features from the cell’s videos got recorded from Capture 1, where the cells have not separated By retrieving these analyzed features,