1. Trang chủ
  2. » Tất cả

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu

41 3 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 41
Dung lượng 3,7 MB

Nội dung

xử lý ngôn ngữ tự nhiên,regina barzilay,ocw mit edu Text Segmentation Regina Barzilay MIT October, 2005 CuuDuongThanCong com https //fb com/tailieudientucntt http //cuuduongthancong com?src=pdf https[.]

Text Segmentation Regina Barzilay MIT October, 2005 CuuDuongThanCong.com https://fb.com/tailieudientucntt Linear Discourse Structure: Example Stargazers Text(from Hearst, 1994) • Intro - the search for life in space • The moon’s chemical composition • How early proximity of the moon shaped it • How the moon helped life evolve on earth • Improbability of the earth-moon system CuuDuongThanCong.com https://fb.com/tailieudientucntt What is Segmentation? Segmentation: determining the positions at which topics change in a stream of text or speech SEGMENT 1: OKAY tsk There’s a farmer, he looks like ay uh Chicano American, he is picking pears A-nd u-m he’s just picking them, he comes off the ladder, a-nd he- u-h puts his pears into the basket SEGMENT 2: U-h a number of people are going by, and one of them is um I don’t know, I can’t remember the first the first person that goes by CuuDuongThanCong.com https://fb.com/tailieudientucntt Skorochodko’s Text Types Chained Ringed Monolith Piecewise CuuDuongThanCong.com https://fb.com/tailieudientucntt Word Distribution in Text Table removed for copyright reasons Please see: Figure in Hearst, M "Multi-Paragraph Segmentation of Expository Text." Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics (ACL 94), June 1994 (http://www.sims.berkeley.edu/~hearst/papers/tiling-acl94/acl94.html) CuuDuongThanCong.com https://fb.com/tailieudientucntt 100 Sentence Index 200 300 400 500 CuuDuongThanCong.com 100 200 300 Sentence Index 400 500 https://fb.com/tailieudientucntt Today • Evaluation measures • Similarity-based segmentation • Feature-based segmentation CuuDuongThanCong.com https://fb.com/tailieudientucntt Evaluation Measures • Precision (P): the percentage of proposed boundaries that exactly match boundaries in the reference segmentation • Recall (R): the percentage of reference segmentation boundaries that are proposed by the algorithm R • F = (PP+R) Problems? CuuDuongThanCong.com https://fb.com/tailieudientucntt Evaluation Metric: Pk Measure Hypothesized segmentation Reference segmentation okay miss false alarm okay Pk : Probability that a randomly chosen pair of words k words apart is inconsistently classified (Beeferman ’99) • Set k to half of average segment length • At each location, determine whether the two ends of the probe are in the same or different location Increase a counter if the algorithm’s segmentation disagree with the reference segmentation CuuDuongThanCong.com https://fb.com/tailieudientucntt • Normalize the count between and based on the number of measurements taken CuuDuongThanCong.com https://fb.com/tailieudientucntt ... 1994) • Intro - the search for life in space • The moon’s chemical composition • How early proximity of the moon shaped it • How the moon helped life evolve on earth • Improbability of the earth-moon... of the Association for Computational Linguistics (ACL 94), June 1994 (http://www.sims.berkeley .edu/ ~hearst/papers/tiling-acl94/acl94.html) CuuDuongThanCong.com https://fb.com/tailieudientucntt... of the Association for Computational Linguistics (ACL 94), June 1994 (http://www.sims.berkeley .edu/ ~hearst/papers/tiling-acl94/acl94.html) CuuDuongThanCong.com https://fb.com/tailieudientucntt

Ngày đăng: 27/11/2022, 21:17