Luận văn thạc sĩ sinh câu miêu tả cho hình ảnh sử dụng mô hình ngôn ngữ

ĐẠI HỌC QUỐC GIA HÀ NỘI TRƯỜNG ĐẠI HỌC CÔNG NGHỆ -* - ĐÀO BẢO LINH SINH CÂU MIÊU TẢ CHO HÌNH ẢNH SỬ DỤNG MƠ HÌNH NGÔN NGỮ LUẬN VĂN THẠC SĨ CÔNG NGHỆ THÔNG TIN Hà Nội - 2015 z ĐẠI HỌC QUỐC GIA HÀ NỘI TRƯỜNG ĐẠI HỌC CÔNG NGHỆ -* - ĐÀO BẢO LINH SINH CÂU MIÊU TẢ CHO HÌNH ẢNH SỬ DỤNG MƠ HÌNH NGƠN NGỮ Ngành: Cơng Nghệ Thông Tin Chuyên ngành: Kỹ Thuật Phần Mềm Mã số: 60.48.01.03 LUẬN VĂN THẠC SĨ CÔNG NGHỆ THÔNG TIN NGƯỜI HƯỚNG DẪN KHOA HỌC: PGS TS LÊ ANH CƯỜNG Hà Nội - 2015 z NHẬN XÉT CỦA CÁN BỘ HƯỚNG DẪN …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… …………………………………………………………………………………… z LỜI CẢM ƠN Trước tiên, xin gửi lời cảm ơn tới thầy giáo hướng dẫn, PGS TS Lê Anh Cường (University of Enginerring and Technology) người trực tiếp hướng dẫn tạo điều kiện tốt để tơi hồn thành luận văn Tôi xin gửi lời cảm ơn tới PGS.TS Yusuke Miyao (National Institute of Informatics), người hướng dẫn tạo điều kiện cho tơi q trình nghiên cứu đề tài Nhật Bản Tôi xin gửi lời cảm ơn chân thành tới thầy cô giáo trường Đại học Công Nghệ, người trực tiếp giảng dạy, hướng dẫn tạo điều kiện cho trình học tập thực hành trường Cuối cùng, xin gửi lời cảm ơn tới tất bạn học gia đình ủng hộ, giúp đỡ tơi suốt q trình tơi thực luận văn Hà Nội, ngày tháng năm 2015 Học viên Đào Bảo Linh z LỜI CAM ĐOAN Tôi xin cam đoan luận văn với đề tài “Sinh câu miêu tả cho hình ảnh sử dụng mơ hình ngơn ngữ” cơng trình nghiên cứu riêng tơi Các số liệu, kết trình bày luận văn hoàn toàn trung thực chưa cơng bố cơng trình khác Tơi trích dẫn đầy đủ tài liệu tham khảo, cơng trình nghiên cứu liên quan nước quốc tế Trong nội dung trình bày luận văn, thể rõ ràng xác tơi đóng góp Luận văn hồn thành thời gian tơi làm Học viên Khoa Công Nghệ Thông tin, Trường Đại Học Công Nghệ, Đại Học Quốc Gia Hà Nội Học viên Đào Bảo Linh z MỤC LỤC Mục lục Danh mục từ viết tắt Thuật ngữ sử dụng Danh mục bảng biểu Danh sách hình vẽ MỞ ĐẦU Chương MƠ TẢ BÀI TỐN 1.1 Bài toán Ý nghĩa 1.2 Các nghiên cứu liên quan 10 1.3 Phạm vi nghiên cứu luận văn 11 Chương PHƯƠNG PHÁP 12 2.1 Tập văn 12 2.1.1 Các loại tập văn 12 2.1.2 Cấu trúc tập văn 13 2.1.3 Chú thích (annotation) 14 2.1.4 Sử dụng tập văn 14 2.2 Mơ hình ngơn ngữ 15 2.2.1 Khái quát 15 2.2.2 Tầm quan trọng mơ hình ngơn ngữ N-gram 15 2.2.3 Mơ hình ngôn ngữ N-gram 17 2.2.4 Xích Markov 18 2.2.5 Ước lượng xác suất 20 2.2.6 Đánh giá mơ hình xác suất qua độ hỗn loạn thông tin 21 2.3 Thuật tốn tìm kiếm 22 2.3.1 Thuật tốn tìm kiếm theo chiều rộng 22 2.3.2 Thuật tốn tìm kiếm theo chiều sâu 24 2.3.3 Thuật tốn tìm kiếm theo lựa chọn tốt nhất(Best-first search) 25 Chương MƠ HÌNH BÀI TỐN 29 3.1 Tổng quan mơ hình 29 3.2 Phát đối tượng 29 3.2.1 Đề cử vùng 30 z 3.2.2 Trích chọn đặc trưng 34 3.3 Sinh câu miêu tả cho đối tượng 35 Chương THỰC NGHIỆM 40 4.1 Môi trường cài đặt 40 4.2 Quá trình thử nghiệm 41 4.2.1 Nhận dạng đối tượng 41 4.2.2 Tập văn huấn luyện 42 4.2.3 Kết hệ thống sinh câu miêu tả 43 KẾT LUẬN 44 TÀI LIỆU THAM KHẢO 45 PHỤ LỤC 47 z Danh mục từ viết tắt Ký hiệu viết tắt Viết đầy đủ Ý nghĩa SVM support vector machine Máy hỗ trợ vector NLP Natural Language Processing Xử lý ngôn ngữ tự nhiên CNN Convolutional Neural Network-CNN Mạng nơ-ron nhân chập OWs Other words Các từ khác BFS Best-first search z Tìm kiếm theo lựa chọn tối ưu Thuật ngữ sử dụng Tiếng Anh Tiếng Việt Breadth-first search Tìm kiếm theo chiều rộng Depth-first search Tìm kiếm theo chiều sâu Graph-based segmentation Phân đoạn ảnh dựa đồ thị Bag-of-words Tập từ rời rạc Dataset Tập liệu Computer vision Thị giác máy Text Description Văn miêu tả Corpus Tập văn Annotation Chú thích Perplexity Độ hỗn loạn thông tin z Danh mục bảng biểu Bảng 2.2-1 Ước lượng xác suất từ xuất sau cụm từ tương ứng tập văn z TÀI LIỆU THAM KHẢO Tiếng Việt Phạm Thọ Hồn, Phạm Thị Anh Lê (2011): Giáo trình trí tuệ nhân tạo, Khoa Công nghệ thông tin, trường Đại học Sư Phạm Hà Nội, tr 10-31 Đinh Mạnh Tường (2002): Giáo trình trí tuệ nhân tạo: NXB Khoa học Kỹ thuật, tr 16-41 Nguyễn Duy Tiến (2000): Các mơ hình xác suất ứng dụng, NXB Đại học quốc gia Hà Nội, tr 11-32 Đặng Hùng Thắng (2007): Q trình ngẫu nhiên tính tốn ngẫu nhiên NXB Đại học quốc gia Hà Nội, tr 5-6 Tiếng Anh B Z Yao, X Yang, L Lin, M W Lee, and S.-C Zhu (2010) I2T: Image Parsing to Text Description Proceedings of the IEEE , pp 1485–1508 Ushiku, Y., Harada, T., Kuniyoshi, Y (2012): Efficient Image Annotation for Automatic Sentence Generation, ACM MM Hao Fang∗ Saurabh Gupta∗ Forrest Iandola∗ Rupesh K Srivastava∗ Li Deng Piotr Dollár† Jianfeng Gao (2015): From Captions to Visual Concepts and Back, Microsoft Research Stuart J Russell and Peter Norvig (2009) Artificial Intelligence: A Modern Approach 3rd Edition, Prentice Hall, Upper Saddle River, New Jersey I Endres and D Hoiem (2010) Category independent object proposals, In ECCV 10 J Uijlings, K van de Sande, T Gevers, and A Smeulders (2013) Selective search for object recognition, IJCV 11 Ross Girshick et al Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR14 z 45 12 H Harzallah, F Jurie, and C Schmid (2009) Combining efficient object localization and image classification, In ICCV 13 P Arbela ́ez, M Maire, C Fowlkes, and J Malik (2011) Contour detection and hierarchical image segmentation, TPAMI 14 P F Felzenszwalb and D P Huttenlocher (2004) Efficient Graph Based Image Segmentation IJCV, pp 167–181 15 Yoshitaka Ushiku, Tatsuya Harada, and Yasuo Kuniyoshi (2011) Understanding images with natural sentences ACM Multimedia,ACM, pp 679682 16 A Krizhevsky, I Sutskever, and G Hinton (2012) ImageNet classification with deep convolutional neural networks In NIPS 17 Richard Szeliski (2010) Computer Vision: Algorithms and Applications , Springer, 655-656 18 Daniel Jurafsky, James H Martin (2009) Speed and language processing, 2nd edition, pp Website: 19 http://viet.jnlp.org/kien-thuc-co-ban-ve-xu-ly-ngon-ngu-tu-nhien/mohinh-ngon-ngu 20 https://en.wikipedia.org/wiki/Computer_vision z 46 PHỤ LỤC Đoạn chương trình sinh câu: import string from operator import itemgetter class SentenceGenerator: global biGram global startBiGram def init (self, model): self.biGram = {} lm = {} self.startBiGram = {} start = {} lmfile = open(model, 'r') biGrams = False triGrams = False for line in lmfile: line = string.rstrip(line) if line[:5] == '\\end\\': break if line == "": z 47 biGrams = False triGrams = False continue if biGrams: fields = line.split() if fields[1] == "": seStartKey = fields[1] seStartValue = (fields[2], string.atof(fields[0])) if seStartKey in self.startBiGram: self.startBiGram[seStartKey].append(seStartValue) else: self.startBiGram[seStartKey] = [seStartValue] else: sekey = fields[1] sevalue = (fields[2], string.atof(fields[0])) if sekey in self.biGram: self.biGram[sekey].append(sevalue) else: self.biGram[sekey] = [sevalue] if triGrams: fields = line.split() if fields[1] == "": z 48 startkey = (fields[1], fields[2]) startvalue = (fields[3], string.atof(fields[0])) if startkey in start: start[startkey].append(startvalue) else: start[startkey] = [startvalue] else: key = (fields[1], fields[2]) value = (fields[3], string.atof(fields[0])) if key in lm: lm[key].append(value) else: lm[key] = [value] if line[:9] == '\\2-grams:': biGrams = True elif line[:9] == '\\3-grams:': triGrams = True lmfile.close() @classmethod def genSentences(self, bagOfWord, startBiGram, biGram,timer): sentence=[] z 49 start_bi_list = startBiGram.keys() w1 = start_bi_list[0] nextWord = startBiGram[w1] lastWord = "" oNode = [(w1, 0)] cNode = [] tmp = [] k=0 flag = eob="" t=0 while 1: if not bagOfWord: break nextWord.sort(key=itemgetter(1), reverse=True) oNode.sort(key=itemgetter(1), reverse=True) if not oNode: print "Failure" break if lastWord == "": print "Finished!" z 50 break if flag == 0: for i in range(len(nextWord)): string = w1 + " " + nextWord[i][0] value = nextWord[i][1] oNode.append((string, value)) tmp.append(nextWord[i][0]) oNode.pop(0) flag = lastWord = oNode[0][0].rsplit(None, 1)[-1] elif flag == 1: for i in range(len(nextWord)): tmp.append(nextWord[i][0]) del tmp[:] if bagOfWord[k] in oNode[0][0]: print "Found word:", bagOfWord[k] lastWord = bagOfWord[k] k=k+1 cNode.append(oNode[0]) del oNode[:] oNode.append(cNode[-1]) eob = bagOfWord[-1] z 51 else: del tmp[:] for item in nextWord: tmp.append(item[0]) if bagOfWord[k] in tmp: for item in nextWord: if (item[0] == bagOfWord[k]): string = oNode[0][0] + " " + item[0] value = oNode[0][1] + item[1] oNode.append((string, value)) else: for item in nextWord: if item[0] not in oNode[0][0]: string = oNode[0][0] + " " + item[0] value = oNode[0][1] + item[1] oNode.append((string, value)) oNode.pop(0) oNode.sort(key=itemgetter(1), reverse=True) lastWord = oNode[0][0].rsplit(None, 1)[-1] while lastWord == "": if eob in oNode[0]: string = oNode[0][0] + " " + lastWord z 52 value = oNode[0][1] oNode.append((string, value)) sentence=oNode[0] print "Finished!" break else: oNode.pop(0) lastWord = oNode[0][0].rsplit(None, 1)[-1] nextWord = biGram[lastWord] del tmp[:] while k > (len(bagOfWord) - 1): for item in nextWord: if item[0] not in oNode[0][0]: string = oNode[0][0] + " " + item[0] value = oNode[0][1] + item[1] oNode.append((string, value)) oNode.pop(0) oNode.sort(key=itemgetter(1), reverse=True) lastWord = oNode[0][0].rsplit(None, 1)[-1] if lastWord == "": z 53 sentence=oNode[0] break nextWord = biGram[lastWord] if t == timer: print "Timeout:", oNode break t=t+1 del oNode[:] del cNode[:] return sentence @classmethod #choosing a word has highest probability with Begin of Sentence #output: [(' The cat', -3.117223), (' a dog', -3.1544452), (' The girl', -3.3955659999999996), (' An apple', -3.692948)] def getFirstWord(self, bagOfWord, startBiGram, biGram,timer): start_bi_list = startBiGram.keys() cNode = [] tmp = [] lst = [] k=0 z 54 eob="" for item in bagOfWord: w1 = start_bi_list[0] nextWord = startBiGram[w1] lastWord = "" oNode = [(w1,0)] flag = t=0 while 1: if not bagOfWord: break nextWord.sort(key=itemgetter(1), reverse=True) oNode.sort(key=itemgetter(1), reverse=True) if not oNode: print "Failure" break if lastWord == "": print "Finished!" break if flag == 0: for i in range(len(nextWord)): z 55 string = w1 + " " + nextWord[i][0] value = nextWord[i][1] oNode.append((string, value)) tmp.append(nextWord[i][0]) oNode.pop(0) flag = lastWord = oNode[0][0].rsplit(None, 1)[-1] elif flag == 1: for i in range(len(nextWord)): tmp.append(nextWord[i][0]) del tmp[:] if bagOfWord[k] in oNode[0][0]: print "Found word:", bagOfWord[k] lastWord = bagOfWord[k] k=k+1 lst.append((oNode[0][0],oNode[0][1])) del oNode[:] del cNode[:] break else: del tmp[:] for item in nextWord: z 56 tmp.append(item[0]) if bagOfWord[k] in tmp: for item in nextWord: if (item[0] == bagOfWord[k]): string = oNode[0][0] + " " + item[0] value = oNode[0][1] + item[1] oNode.append((string, value)) else: for item in nextWord: if item[0] not in oNode[0][0]: string = oNode[0][0] + " " + item[0] value = oNode[0][1] + item[1] oNode.append((string, value)) oNode.pop(0) oNode.sort(key=itemgetter(1), reverse=True) lastWord = oNode[0][0].rsplit(None, 1)[-1] while lastWord == "": if eob in oNode[0]: string = oNode[0][0] + " " + lastWord value = oNode[0][1] oNode.append((string, value)) sentence = oNode[0][0] z 57 print "Finished!" break else: oNode.pop(0) lastWord = oNode[0][0].rsplit(None, 1)[-1] nextWord = biGram[lastWord] del tmp[:] while k > (len(bagOfWord) - 1): for item in nextWord: if item[0] not in oNode[0][0]: string = oNode[0][0] + " " + item[0] value = oNode[0][1] + item[1] oNode.append((string, value)) oNode.pop(0) oNode.sort(key=itemgetter(1), reverse=True) lastWord = oNode[0][0].rsplit(None, 1)[-1] if lastWord == "": sentence = oNode[0][0] break nextWord = biGram[lastWord] z 58 if t == timer: break t=t+1 del oNode[:] del cNode[:] lst.sort(key=itemgetter(1), reverse=True) return lst @classmethod #Input: bag of word and list include output of getFirstWord function #output: [('cat', 'girl'), ('cat', 'dog'), ('cat', 'apple')] def splitInToGroups(self,bagOfWords,lst): wGroup=[] bestWord= lst[0][0].split()[len(lst[0][0].split())-1] if bestWord in bagOfWords: bagOfWords.remove(bestWord) for item in bagOfWords: wGroup.append((bestWord,item)) return wGroup z 59 ... cứu phương pháp sinh câu miêu tả cho hình ảnh, sử dụng mơ hình ngơn ngữ tự nhiên 1.3 Phạm vi nghiên cứu luận văn Ý tưởng chủ đạo luận văn nghiên cứu sinh câu miêu tả cho hình ảnh, để có kết phải... sinh câu miêu tả cho hình ảnh mở ý tưởng hệ thống cho phép tìm kiếm hình ảnh thơng qua miêu tả có hiệu cao hơn, hình ảnh có tên mơ tả mặc định khơng với nội dung hình ảnh, giúp cho người sử dụng. .. động sinh câu miêu tả ảnh đề xuất [7], quy trình sinh câu miêu tả thực thông qua bước: Với đầu vào ảnh hệ thống phát đối tượng hành động, sinh câu xếp hạng cho câu, đầu câu miêu tả tốt cho ảnh

Định dạng
Số trang	64
Dung lượng	3,87 MB