Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 00 — page 56 — #25 56 • Chapter 2 node = lxml etree Element(''''volta'''') root append(node) tree = lxml etree ElementTree(r[.]
“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:00 — page 56 — #25 56 • Chapter node = lxml.etree.Element('volta') root.append(node) tree = lxml.etree.ElementTree(root) xml_string = lxml.etree.tostring(tree, pretty_print=True).decode() # Print a snippet of the tree: print(xml_string[:xml_string.find("") + 8] + ' ') Let me not to the marriage of minds 2.6.3 TEI A name frequently mentioned in connection to XML and computational work in the humanities is the Text Encoding Initiative (TEI22 ) This is an international scholarly consortium, which maintains a set of guidelines that specify a “best practice” as to how one can best mark up texts in humanities scholarship The TEI is currently used in a variety of digital projects across the humanities, but also in the so-called GLAM sector (Galleries, Libraries, Archives, and Museums) The TEI provides a large online collection of tag descriptions, which can be used to annotate and enrich texts For example, if someone is editing a handwritten codex in which a scribe has crossed out a word and added a correction on top of the line, the TEI guidelines suggest the use of a element to transcribe the deleted word and the element to mark up the superscript addition The TEI provides over 500 tags in their current version of the guidelines (this version is called P5) The TEI offers guidelines and it is not a standard, meaning that it leaves users and projects free to adapt these guidelines to their own specific needs Although there are many projects that use TEI, there are not that many projects that are fully compliant with the P5 specification, because small changes to the TEI guidelines are often made to make them usable for specific projects This can be a source of frustration for developers, because even though a document claims to “use the TEI” or “to be TEI-compliant,” one never really knows what that exactly means For digital text analysis, there are a number of great datasets encoded using “TEI-inspired” XML The Folger Digital Texts is such a dataset All XML encoded texts are located under the data/folger/xml directory This resource provides a very rich and detailed markup: apart from extensive metadata about the play or detailed descriptions of the actors involved, the actual lines have been encoded in such a manner that we perfectly know which character uttered 22 http://www.tei-c.org/index.xml