1. Trang chủ
  2. » Tất cả

Humanities Data Analysis

1 1 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 1
Dung lượng 50,79 KB

Nội dung

Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 00 — page 32 — #1 2CHAPTER Parsing and Manipulating Structured Data ���������������������������������������������� 2 1[.]

“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:00 — page 32 — #1 CHAPTER Parsing and Manipulating Structured Data  2.1 Introduction In this chapter, we describe how to identify and visualize the “social network” of characters in a well-known play, Hamlet, using Python (figure 2.1) Our focus in this introductory chapter is not on the analysis of the properties of the network but rather on the necessary processing and parsing of machinereadable versions of texts It is these texts, after all, which record the evidence from which a character network is constructed If our goal is to identify and visualize a character network in a manner which can be reproduced by anyone, then this processing of texts is essential We also review, in the context of a discussion of parsing various data formats, useful features of the Python language, such as tuple unpacking.1 To lend some thematic unity to the chapter, we draw all our examples from Shakespeariana, making use of the tremendously rich and high-quality data provided by the Folger Digital Texts repository, an important digital resource dedicated to the preservation and sharing of William Shakespeare‘s plays, sonnets, and poems We begin with processing the simplest form of data, plain text, to explain the important concept of “character encoding” (section 2.2) From there, we move on to various popular forms of more complex, structural data markup The extensible markup language (XML) is a topic that cannot be avoided here (section 2.6), because it is the dominant standard in the scholarly community, used, for example, by the influential Text Encoding Initiative (TEI) (section 2.6.3) Additionally, we survey Python’s support for other types of structured data such as CSV (section 2.3), HTML (section 2.7), PDF (section 2.4), and JSON (section 2.5) In the final section, where we eventually (aim to) replicate the Hamlet character network, we hope to show how various file and data formats can be used with Python to exchange data in an efficient and platform-independent manner For the idea of visualizing the character network of Hamlet we are indebted to Moretti (2011) Note, however, that Moretti’s Hamlet network is not reproducible and depends on ad hoc determinations of whether or not characters interacted

Ngày đăng: 20/11/2022, 11:27

TÀI LIỆU CÙNG NGƯỜI DÙNG

  • Đang cập nhật ...

TÀI LIỆU LIÊN QUAN