Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 28 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
28
Dung lượng
1,44 MB
Nội dung
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY SCHOOL OF ELECTRONICS AND TELECOMMUNICATIONS INTERNSHIP REPORT Topic: PERSON RE-IDENTIFICATION Instructor: Dr Vo Le Cuong Student: Nguyen Tuan Nghia Student ID: 20122147 Class: ET AP K57 Hanoi, 3-2017 Nguyen Tuan Nghia - ET AP - K57 REVIEWS OF INTERNSHIP REPORT Student’s name: Nguyen Tuan Nghia Student ID: 20122147 Class: Electronics and Telecoms AP Course: 57 Instructor: Dr Vo Le Cuong Critical Officer: Internship report content: Reviews of Critical Officer: Hanoi, / /2017 Critical Officer (sign and write full name) Nguyen Tuan Nghia - ET AP - K57 INTRODUCTION Internship is an important phase to every undergraduate student before working on graduation thesis Since there are many differences between theory and practice, internship grants student a chance to develop and apply their knowledge adaptively on practical situations In addition, it provides students a view of expert working environment, in which they also learn to teamwork, to communicate and to present their works to others Those skills are essential that every engineer should have It is getting simpler for students, especially who are learning electronics and telecommunications, to internship nowadays As there are many high-tech companies in Viet Nam, students are allowed to choose ones of interest Being able to work in an expected environment help them earn experience much faster than ever If everything is done well, they also have chance to continue working for the company after graduation I am lucky to be accepted to internship under Dr Vo Le Cuong’s instruction in AICS Lab, locating at room 618 of Ta Quang Buu library in Hanoi University of Science and Technology In this report, I will introduce AICS Lab in section Section will be focusing on my research during the internship I would like to sincerely thank Dr Cuong and all staffs of School of Electronics and Telecommunications for helping me complete my internship I would like to sincerely thank Prof Hyuk-Jae Lee of Computer Architecture & Parallel Processing Lab, Seoul Nation University for allowing me to use his workstation Without his kindness, I cannot conduct any experiments due to lacks of hardware Nguyen Tuan Nghia - ET AP - K57 ABSTRACT Person re-identification, known as a process of recognizing an individual in camera network, is a fundamental task in automated surveillance It has been receiving attentions for years This task is challenging due to problems such as appearance variations of an individual across different cameras or low quality of video and image resolution There have been many proposals to improve the accuracy of this process In recent years, deep learning based approach has been proven to outperform most traditional ones for person re-identification In this report, I introduce deep learning concept and propose a method to optimize multi-shot deep learning based approach for person re-identification using Recurrent Neural Network I conduct extensive experiments to compare different architectures and find out that Gated Recurrent Unit is the most optimized one that helps achieve highest accuracy while having a reasonable number of parameters Nguyen Tuan Nghia - ET AP - K57 TABLE OF CONTENTS INTRODUCTION ABSTRACT TABLE OF CONTENTS LIST OF FIGURES LIST OF TABLES LIST OF ABBREVIATIONS SECTION 1: AICS LAB 1.1 General information 1.2 Projects and research areas SECTION 2: INTERNSHIP CONTENT 10 2.1 Deep learning 10 2.1.1 The concept 10 2.1.2 Deep learning for person re-identification 11 2.2 Multi-shot deep learning methods 12 2.2.1 Recurrent Neural Network 13 2.2.2 Long Short Term Memory Network 14 2.2.3 Gated Recurrent Unit 17 2.3 Experiment 17 2.3.1 Caffe 18 2.3.2 Datasets and evaluation settings 19 2.3.3 Network implementations 20 2.3.4 Classifier 22 2.3.5 Result 22 2.4 Conclusion 23 SECTION 3: GRADUATION THESIS PLAN 24 REFERENCE 25 APPENDIX: IMPLEMENTATION DETAILS 27 Nguyen Tuan Nghia - ET AP - K57 LIST OF FIGURES Figure 2.1 Problems when choosing algorithm to map input x to category y [1] 11 Figure 2.2 Recurrent Neural Networks with loops [18] 13 Figure 2.3 Unrolled recurrent neural network [18] 13 Figure 2.4 The repeating module in a standard RNN [18] 13 Figure 2.5 RNN make uses of temporal information [18] 14 Figure 2.6 The problem of long term dependencies [18] 14 Figure 2.7 The repeating module in an LSTM [18] 15 Figure 2.8 Forget gate f [18] 15 Figure 2.9 Input gate i and candidate vector C̃ [18] 16 Figure 2.10 Updating cell state C [18] 16 Figure 2.11 Output gate o and hidden output h [18] 16 Figure 2.12 Repeating module of Gated Recurrent Unit [18] 17 Figure 2.13 Experiment procedure 18 Figure 2.14 Data split settings for PRID-2011 20 Figure 2.15 LSTM with Peephole connections [15] [18] 20 Figure 2.16 LSTM with coupled gate [18] 21 Figure 2.17 Recurrent Feature Aggregation Network [9] 21 LIST OF TABLES Table 2.1 Performance of different LSTM architectures (Rank-1 accuracy) 23 Table 2.2 Size of different models (file caffemodel) 23 Nguyen Tuan Nghia - ET AP - K57 LIST OF ABBREVIATIONS CNN Convolutional Neural Network GRU Gated Recurrent Unit LBP Local Binary Pattern LSTM Long Short Term Memory RFA Recurrent Feature Aggregation RNN Recurrent Neural Network SIFT Scale-Invariant Feature Transform SVM Support Vector Machine Nguyen Tuan Nghia - ET AP - K57 SECTION 1: AICS LAB 1.1 General information AICS Lab, located at room 618 inside Ta Quang Buu library, is a laboratory of School of Electronics and Telecommunications, belonging to research center of Hanoi University of Science and Technology Its research field includes IC, computer vision and camera sensor AICS Lab has been making positive contribution to the development of School of Electronics and Telecommunications AICS Lab was found in 2010 by Dr Vo Le Cuong with members At first, there were many difficulties such as shortage of facilities and equipment However, with youth power and passion towards researching, they have won many achievements as well as completed numerous of projects Currently, there are official members and 10 trainees working on different areas Members of AICS Lab have high chance of working for big company after graduation and studying abroad in developed countries AICS Lab provides an open working environment Lab room has essential equipment for working such as desks and computers Members can also decorate their own workspaces with anything of interest for convenience People work for all week days from A.M All works will be reported to Dr Cuong twice a week through a quick discussion and a long one Meeting schedule is decided by both members and Dr Cuong via email In the long meeting, each member will present their work and receive comments for what to next week Besides working, members also have extracurricular activities such as eating lunch together 1.2 Projects and research areas There are currently three projects and two research areas They are: Lens defect detection (in co-operation with Haesung Vina Co., Ltd): focus on applying efficient algorithm and building an automated lens defect detecting system Nguyen Tuan Nghia - ET AP - K57 Rolling door application (in co-operation with Kato Company): focus on building application that allows users to operate and control rolling door and to protect their house from thieves Football player tracker (in cooperation with Vietnam Television): focus on building an automated system that can recognize and track football player Image processing on FPGA: focus on implementing image processing algorithm on FPGA for real-time object detection system Person re-identification: focus on building an efficient algorithm for person re-identification task Nguyen Tuan Nghia - ET AP - K57 SECTION 2: INTERNSHIP CONTENT 2.1 Deep learning 2.1.1 The concept In computer science, machine learning is subfield that gives computers the ability to learn without being explicitly programmed Machine learning explores the study and construction of algorithms that can learn from and make predictions on data [1] Machine learning is employed in a wide range of computing tasks including computer vision and pattern recognition A machine learning algorithm is an algorithm that is able to learn from data Mitchell (1997) provides the definition [1] “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” In the context, the task, T, is generally defined and will change to fit a specific problem For example, if we want a robot to walk, then walking is the task, or if we want a robot to speak, then speaking is the task Similar to human learning process in which an individual improve himself through experience, machine builds up experience E by measuring its performance P while trying to solve the task T Every attempt to solve the task helps the machine learn from and gradually construct a model to fit the given data Machine learning strictly depends on data Simple data structure requires simple learning algorithm As the data becomes complicated, building an equivalent algorithm is essential Trying to model complicated data structure with simple algorithm causes underfitting problem which results in inaccurate predictions The idea of deep learning was proposed to fulfill the need of such difficult problems Deep learning is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data The term ‘deep’ generally describes a property of this kind of learning algorithm In neural computing, instead of having one hidden layer, a deep feedforward network can have ten times more, in which each layer 10 Nguyen Tuan Nghia - ET AP - K57 There have been incredible success applying RNNs to a variety of problems including person re-id McLaughlin, N et al achieved a rank-1 accuracy of 70% [7] on PRID-2011 dataset by using RNNs combined with CNN for their proposed model 2.2.2 Long Short Term Memory Network RNNs connect information from previous input to the present one, which is extremely useful to understand the changes of a person in the video They are capable of remembering information over time However, when the time gap grows, RNNs become less effective since it starts to forget relevant information and keeps only redundant one [8] In order to solve such long-term dependencies problem, Hochreiter & Schmidhuber introduced Long Short Term Memory [9], a special kind of RNN Figure 2.5 RNN make uses of temporal information [18] Figure 2.6 The problem of long term dependencies [18] LSTM introduces a more complicated structure inside one chunk of network with a new output cell state, which works like a memory LSTM has ability to remove or add information to the cell state, carefully regulated by structures called gate Gates are composed of a sigmoid neural network layer and a pointwise multiplication operation The outputs of sigmoid layer are between zero and one, describing how much of each 14 Nguyen Tuan Nghia - ET AP - K57 component to be passed through A value of zero means ‘let nothing through’ while a value of one means ‘let everything through’ An LSTM has three gates, named forget (f), input (i) and output (o) As shown in Figure 2.7, all three gates look at both input x at time step t and hidden output h of previous time step t-1 to decide the flow of information Figure 2.7 The repeating module in an LSTM [18] Forget gate allows relevant information stored in the cell state to pass through while forgetting the rest Figure 2.8 Forget gate f [18] After flushing some information, LSTM look at the input and decide what to add to current state Input gate allows a part of candidate vector C̃, which is created by a layer, to be updated to current cell state 15 Nguyen Tuan Nghia - ET AP - K57 Figure 2.9 Input gate i and candidate vector 𝐂̃ [18] Current cell state is updated by combining remembered part of previous cell state and candidate vector Figure 2.10 Updating cell state C [18] Finally, hidden output h is decided based on current cell state with the help of filter output gate o Figure 2.11 Output gate o and hidden output h [18] LSTM has proven its effectiveness in many tasks, such as action recognition [10], including person re-id Yichao Yan et al proposed a neural network architecture that uses LSTM [11] to extract sequence-level features of a person The network is trained 16 Nguyen Tuan Nghia - ET AP - K57 and tested on PRID-2011 dataset with a rank-1 accuracy of 58.2% Even though it is lower than one using RNN by McLaughlin, N et al [6], the difference between Yan’s and McLaughlin’s proposal may not come from using RNN and LSTM, but from choices of inputs In addition, Yan’s proposal combines simple traditional features with LSTM and results in a faster and simpler model but still be able to achieve high accuracy We not know yet if we replace RNN in McLaughlin’s proposal with LSTM 2.2.3 Gated Recurrent Unit Figure 2.12 Repeating module of Gated Recurrent Unit [18] Besides original LSTM architecture, there are some popular LSTM variants used in many papers Most of LSTM variants are slightly different from the original one Therefore, their performances are almost the same However, a dramatic variation on LSTM, called Gated Recurrent Unit [12], is worth mentioning Instead of using separated forget and input gates, GRU combines these two into a single update gate Cell state and hidden state are also merged into one The resulting model is not only simpler than LSTM, but also achieve notably higher accuracy During my internship, I tried replacing LSTM in Yan’s proposal [11] with GRU and achieved rank-1 accuracy of 61.73% on PRID-2011 dataset Details of my work will be shown in the next part 2.3 Experiment In order to test the LSTM performance on person re-id task, I have followed the evaluation settings of [13] In the paper, authors proposed using extracted LBP and Color 17 Nguyen Tuan Nghia - ET AP - K57 features to train Recurrent Feature Aggregation network (RFA-Net) The resulting model is used to extract sequence-level features of persons in test set Then a metric learning method uses these multi-shot features to measure performance of whole system I have re-done the whole process In addition, I tested the performance of new system in which LSTM was replaced with various LSTM modifications including GRU and received reasonable results Figure 2.13 Experiment procedure 2.3.1 Caffe Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center and by community contributors [14] Due to its popularity and strong community, Caffe is chosen over TensorFlow or Theano Working with Caffe is straight forward Simple task such as fine-tuning or training a toy model does not require coding but specifying the architecture and its hyperparameter for training Most commonly used features are implemented in Caffe 18 Nguyen Tuan Nghia - ET AP - K57 including LSTM layer However, GRU and other LSTM variants were implemented by myself since they are not yet available As a result, I have earned a great deal of knowledge on deep learning workflow, most of which is about forward and backward implementation of a layer 2.3.2 Datasets and evaluation settings Two most popular datasets for video-based (or multi-shot) person re-id is PRID2011 and iLIDS-VID Both of them are used to evaluate performance of most proposed methods PRID-2011 dataset The PRID-2011 dataset includes 400 image sequences for 200 persons from two cameras Each image sequence has a variable length consisting of to 675 image frames, with an average number of 100 The images were captured in an uncrowded outdoor environment with relatively simple and clean background and rare occlusion; however, there are significant viewpoint and illumination variations as well as color inconsistency between the two views iLIDS-VID dataset The iLIDS-VID dataset contains 600 image sequences for 300 persons in two non-overlapping camera views Each image sequence consists of variable length from 23 to 192, with an average length of 73 This dataset was created at an airport arrival hall under a multi-camera CCTV network, and the images were captured with significant background clutter, occlusions, and viewpoint/illumination variations, which makes the dataset very challenging Following [13], the sequence pairs with more than 21 frames are used in my experiments The whole set of human sequence pairs of both datasets is randomly split into two subsets with equal size, i.e., one for training and the other for testing The sequence of the first camera is used as probe set while the gallery set is from the other one For both datasets, I report the performance of rank-1 accuracy over 10 trials 19 Nguyen Tuan Nghia - ET AP - K57 Figure 2.14 Data split settings for PRID-2011 2.3.3 Network implementations LSTM Variants Implementation Original LSTM architecture is preimplemented on Caffe In order to compare other architectures with original one, they are written with C++ based on Caffe API LSTM with Peephole connections Peephole connections allow gate to look at cell state and tune the filter Adding peephole connections result in a bigger model as the number of parameters increases Figure 2.15 LSTM with Peephole connections [15] [18] LSTM with coupled forget and input gate Instead of separately decide what to forget and pass through, coupled gate allows the flushed information to be replaced with new one at once 20 Nguyen Tuan Nghia - ET AP - K57 Figure 2.16 LSTM with coupled gate [18] RFA-Net implementation When implemented on Caffe, RFA-Net contains LSTM and some data preparation layers In training phase, outputs of LSTM are fed through a fully connected layer and then a Softmax layer to calculate loss In testing phase, outputs of LSTM are directly used to train and test classifier In my experiment, outputs of LSTM from all nodes are fused Each node is a vector of 512 elements, thus making sequence-level feature vector to contain 512 × L elements, where L is length of an image sequence Figure 2.17 Recurrent Feature Aggregation Network [9] Network Training The sequence of image level features is input to an LSTM network for sequential feature fusion The network is trained as a classification problem 21 Nguyen Tuan Nghia - ET AP - K57 of N classes, where N is number of persons (N = 89 for PRID-2011 dataset and N = 150 for iLIDS-VID dataset) In my experiments, L = 10 was used as the number of frames for each subsequence That means features of 10 consecutive steps are learned Training 30000 iterations with batch size of took approximately 45 minutes on Titan X GPU Card Loss when training GRU converged to zero faster than when training LSTM and resulting GRU model is also smaller than LSTM one 2.3.4 Classifier The use of classifier is in testing phase, after having set of features for each person Support Vector Machine is chosen due to its popularity Support Vector Machine Support Vector Machine (SVM) is a supervised learning algorithm that analyze data used for classification problem Given training data and labels, SVM builds a model that helps map a sample to one of some categories it has learned SVM can efficiently perform a non-linear classification using kernel, implicitly mapping their inputs into high-dimensional feature spaces SVM can be used to solve various practical problems, and is very common in computer vision and pattern recognition 2.3.5 Result The reason for using RFA-Net in the experiments is that its core layer is LSTM In combination with simple features LBP&Color as input and SVM classifier, it is reasonable to say that LSTM is powerful and suitable for person re-id problem since its performance is barely affected by other factors The result shows that the slight variations of LSTM perform almost the same as the original one They give better results on iLIDS-VID but lower on PRID-2011 with a difference of to 2% Coupled model is smaller while Peephole model is a bit larger than original LSTM This result matched with one shown in a survey by Klauss Greff et al [16, 17] In case of GRU, the model not only performs notably better than all three other models but also has the smallest 22 Nguyen Tuan Nghia - ET AP - K57 size In general, GRU is the most effective and efficient architecture for person re-id that reduces training time and number of parameters while giving better result Table 2.1 Performance of different LSTM architectures (Rank-1 accuracy) Dataset PRID-2011 iLIDS-VID Color&LBP+LSTM+SVM 58.05 39.32 Color&LBP+Coupled+SVM 55.34 40.14 Color&LBP+Peephole+SVM 55.45 40.12 Color&LBP+GRU+SVM 61.73 43.34 Table 2.2 Size of different models (file caffemodel) Model Size RFA+LSTM 475,884 KB RFA+Coupled 356,958 KB RFA+Peephole 477,932 KB RFA+GRU 355,934 KB 2.4 Conclusion In this section, I mentioned deep learning concept and multi-shot deep learning based approaches for person re-id task In general, deep learning aims to model high level abstractions in data For person re-id, it has shown the effectiveness for being capable to extract unique set of features and stability on different datasets Using temporal information in addition even boosts the performance more I followed [11, 13] and proposed using GRU and LSTM variants to further assess the performance of multishot person re-id system The results are reasonable that matched with other previous researches 23 Nguyen Tuan Nghia - ET AP - K57 SECTION 3: GRADUATION THESIS PLAN My graduation thesis will be continued with deep learning multi shot based methods for person re-identification, mainly focusing on LSTM and GRU architecture Testing with other types of feature is one task that will be done next One type of feature that I expect to boost the performance of RFA network is CNN feature Since it does not perform well yet, there are more works to For a better motion analysis, I plan to make use of Optical Flow in combination with spatial features with an expectation that this method would further increase the accuracy of the task In addition, new LSTM variants can be invented and tested in parallel During my internship, I have gained experience of expert working environment I learned how a research project is done from analysis to developing a proposal I also had a chance to practice working skills including teamwork, report and presentation I evaluate myself that I had a successful internship, gaining a great deal of knowledge and experience, which are extremely important to my future 24 Nguyen Tuan Nghia - ET AP - K57 REFERENCE [1] I Goodfellow, Y Bengio, and A Courville, Deep Learning MIT Press, 2016, http://www.deeplearningbook.org [2] M Farenzena, L Bazzani, A Perina, V Murino, and M Cristani, “Person reidentification by symmetry-driven accumulation of local features,” in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on IEEE, 2010, pp 2360–2367 [3] D Gray and H Tao, “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European conference on computer vision Springer, 2008, pp 262–275 [4] D G Lowe, “Distinctive image features from scale-invariant keypoints,” International journal of computer vision, vol 60, no 2, pp 91–110, 2004 [5] T Ojala, M Pietikainen, and T Maenpaa, “Multiresolution gray-scale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol 24, no 7, pp 971–987, 2002 [6] Z Zheng, L Zheng, and Y Yang, “A discriminatively learned cnn embedding for person re-identification,” arXiv preprint arXiv:1611.05666, 2016 [7] N McLaughlin, J Martinez del Rincon, and P Miller, “Recurrent convolutional network for video-based person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp 1325–1334 [8] Y Bengio, P Simard, and P Frasconi, “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol 5, no 2, pp 157–166, 1994 [9] S Hochreiter and J Schmidhuber, “Long short-term memory,” Neural computation, vol 9, no 8, pp 1735–1780, 1997 [10] J Liu, A Shahroudy, D Xu, and G Wang, “Spatio-temporal lstm with trust gates for 3d human action recognition,” in European Conference on Computer Vision Springer, 2016, pp 816–833 25 Nguyen Tuan Nghia - ET AP - K57 [11] Y Yan, B Ni, Z Song, C Ma, Y Yan, and X Yang, “Person reidentification via recurrent feature aggregation,” in European Conference on Computer Vision Springer, 2016, pp 701–716 [12] K Cho, B Van Merrienboer, C Gulcehre, D Bahdanau, F Bougares, ă H Schwenk, and Y Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014 [13] T Wang, S Gong, X Zhu, and S Wang, “Person re-identification by video ranking,” in European Conference on Computer Vision Springer, 2014, pp 688– 703 [14] Y Jia, E Shelhamer, J Donahue, S Karayev, J Long, R Girshick, S Guadarrama, and T Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia ACM, 2014, pp 675–678 [15] F A Gers and J Schmidhuber, “Recurrent nets that time and count,” in Neural Networks, 2000 IJCNN 2000, Proceedings of the IEEE-INNSENNS International Joint Conference on, vol IEEE, 2000, pp 189–194 [16] K Greff, R K Srivastava, J Koutn´ık, B R Steunebrink, and J Schmidhuber, “Lstm: A search space odyssey,” IEEE transactions on neural networks and learning systems, 2016 [17] W Zaremba, “An empirical exploration of recurrent network architectures,” 2015 [18] http://colah.github.io/posts/2015-08-Understanding-LSTMs/, last visit: 3/8/2017 [19] R Kohavi and F Provost, “Glossary of terms,” Machine Learning, vol 30, no 23, pp 271–274, 1998 26 Nguyen Tuan Nghia - ET AP - K57 APPENDIX: IMPLEMENTATION DETAILS Recurrent Neural Network ht = tanh(Wx xt + Wh ht−1 ) (1) Long Short Term Memory Network ft = σ(Wfx xt + Wfh ht−1 + bf ) (2) it = σ(Wix xt + Wih ht−1 + bi ) (3) C̃t = tanh(WCx xt + WCh ht−1 + bC ) (4) Ct = ft ∗ Ct−1 + it ∗ C̃t (5) ot = σ(Wox xt + Woh ht−1 + bo ) (6) ht = ot ∗ tanh(Ct ) (7) Long Short Term Memory Network with Coupled gate ft = σ(Wfx xt + Wfh ht−1 + bf ) (8) C̃t = tanh(WCx xt + WCh ht−1 + bC ) (9) Ct = ft ∗ Ct−1 + (1 − ft ) ∗ C̃t (10) ot = σ(Wox xt + Woh ht−1 + bo ) (11) ht = ot ∗ tanh(Ct ) (12) Long Short Term Memory Network with Peephole connection ft = σ(Wfx xt + Wfh ht−1 + Wfcp Ct−1 + bf ) (13) it = σ(Wix xt + Wih ht−1 + Wicp Ct−1 + bi ) (14) C̃t = tanh(WCx xt + WCh ht−1 + bC ) (15) Ct = ft ∗ Ct−1 + it ∗ C̃t (16) ot = σ(Wox xt + Woh ht−1 + Woc Ct + bo ) (17) ht = ot ∗ tanh(Ct ) (18) 27 Nguyen Tuan Nghia - ET AP - K57 Gated Recurrent Unit zt = σ(Wzx xt + Wzh ht−1 + bz ) (19) rt = σ(Wrx xt + Wrh ht−1 + br ) (20) h̃t = tanh(WHx xt + WHhr (ht−1 ∗ rt ) + bH ) (21) ht = (1 − zt ) ∗ ht−1 + zt ∗ h̃t (22) 28 ... difficult task like person re-identification is reasonable Figure 2.1 Problems when choosing algorithm to map input x to category y [1] 2.1.2 Deep learning for person re-identification Person re-identification... (person re-id) deals with the problem of recognizing an individual across non-overlapping cameras When a person appears in a camera, re-id system should be able to distinguish him with other persons... traditional ones for person re-identification In this report, I introduce deep learning concept and propose a method to optimize multi-shot deep learning based approach for person re-identification