Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 01 — page 136 — #11 136 • Chapter 4 1880 M 274 1880 M 271 Remember that the data structure underlying a DataFrame obje[.]
“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 136 — #11 136 • Chapter 1880 M 274 1880 M 271 Remember that the data structure underlying a DataFrame object is a twodimensional NumPy ndarray object The underlying ndarray can be accessed through the values attribute As such, the same indexing and slicing mechanisms can be employed directly on this more low-level data structure, i.e.: array = df.values array_slice = array[1000:1100, -2:] In what preceded, we have covered the very basics of working with tabular data in Pandas Naturally, there is a lot more to say about both Pandas’s objects In what follows, we will touch upon various other functionalities provided by Pandas (e.g., more advanced data selection strategies, and plotting techniques) To liven up this chapter’s exposition of Pandas’s functionalities, we will so by exploring the long-term shift in naming practices as addressed in Lieberson (2000) 4.2 Mapping Cultural Change 4.2.1 Turnover in naming practices Lieberson (2000) explores cultural changes in naming practices in the past two centuries As previously mentioned, he describes an acceleration in the rate of change in the leading names given to newborns Quantitatively mapping cultural changes in naming practices can be realized by considering their “turnover series” (cf Acerbi and Bentley 2014) Turnover can be defined as the number of new names that enter a popularity-ranked list at position n at a particular moment in time t The computation of a turnover series involves the following steps: first, we can calculate an annual popularity index, which contains all unique names of a particular year ranked according to their frequency of use in descending order Subsequently, the popularity indexes can be put in chronological order, allowing us to compare the indexes for each year to the previous year For each position in the ranked lists, we count the number of “shifts” in the ranking that have taken place between two consecutive years This “number of shifts” is called the turnover Computing the turnover for all time steps in our collections yields a turnover series To illustrate these steps, consider the artificial example in table 4.1, which consists of five chronologically ordered ranked lists For each two consecutive points in time, for example t1 and t2 , the number of new names that have entered the ranked lists at a particular position n is counted Between t1 and t2 , the number of new names at position and equals zero, while at position 3, there is a different name (i.e., William) When we compare t2 and t3 , the turnover at the highest rank equals one, as Henry takes over the position of John In what follows, we revisit these steps, and