Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 01 — page 146 — #21 146 • Chapter 4 Figure 4 1 Visualization of the absolute turnover for girl names in the United Sta[.]
“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 146 — #21 146 • Chapter Figure 4.1 Visualization of the absolute turnover for girl names in the United States of America That was a lot to process Mastering a rich and complex library such as Pandas requires time, patience, and practice As you become more familiar with the library, you will increasingly see opportunities to make your code simpler, cleaner, and faster Our advice is to start with a more verbose solution That’s not always the most efficient solution, but it ensures that you understand all the individual steps When such a first draft solution works, you can try to replace certain parts step by step with the various tools that Pandas offers 4.2.2 Visualizing turnovers In the previous section, we have shown how to compute annual turnovers We now provide more insight into the computed turnover series by creating a number of visualizations One of the true selling points of the Pandas library is the ease with which it allows us to plot our data Following the expression “show, don’t tell,” let us provide a demonstration of Pandas’s plotting capabilities To produce a simple plot of the absolute turnover per year (see figure 4.1), we write the following: ax = girl_turnover.plot( style='o', “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 147 — #22 Processing Tabular Data Figure 4.2 Histogram of the turnover for girl names in the United States of America ylim=(-0.1, 3.1), alpha=0.7, title='Annual absolute turnover (girls)') ax.set_ylabel("Absolute turnover") Pandas’s two central data types (Series and DataFrame) feature the method which enables us to efficiently and conveniently produce high-quality visualizations of our data In the example above, calling the method plot() on the Series object girl_turnover produces a simple visualization of the absolute turnover per year Note that Pandas automatically adds a label to the X axis, which corresponds to the name of the index of girl_turnover In the method call, we specify three arguments First, by specifying style='o' we tell Pandas to produce a plot with dots Second, the argument ylim=(-0.1, 3.1) sets the y-limits of the plot Finally, we assign a title to the plot with plot(), title="Annual absolute turnover (girls)." The default plot type produced by plot() is of kind “line.” Other kinds include “bar plots,” “histograms,” “pie charts,” and so on and so forth To create a histogram of the annual turnovers (see figure 4.2), we could write something like the following: girl_turnover.plot(kind='hist') Although we can discern a tendency towards a higher turnover rate in modern times, the annual turnover visualization does not provide us with an easily interpretable picture In order to make such visual intuitions more clear and to • 147 “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:01 — page 148 — #23 148 • Chapter Figure 4.3 Visualization of the moving average turnover (window is 25 years) for girl names test their validity, we can employ a smoothing function, which attempts to capture important patterns in our data, while leaving out noise A relatively simple smoothing function is called “moving average” or “rolling mean.” Simply put, this smoothing function computes the average of the previous w data points for each data point in the collection For example, if w = and the current data point is from the year 2000, we take the average turnover of the previous five years Pandas implements a variety of “rolling” functions through the method Series.rolling() This method’s argument window allows the user to specify the window size, i.e., the previous w data points By subsequently calling the method Series.mean() on top of the results yielded by Series.rolling(), we obtain a rolling average of our data Consider the following code block, in which we set the window size to 25: girl_rm = girl_turnover.rolling(25).mean() ax = girl_rm.plot(title="Moving average turnover (girls; window = 25)") ax.set_ylabel("Absolute turnover") The resulting visualization in figure 4.3 confirms our intuition, as we can observe a clear increase of the turnover in modern times Is there a similar accelerating rate of change in the names given to boys? ... turnover") Pandas’s two central data types (Series and DataFrame) feature the method which enables us to efficiently and conveniently produce high-quality visualizations of our data In the example above,...“125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:01 — page 147 — #22 Processing Tabular Data Figure 4.2 Histogram of the turnover for girl names in... data, while leaving out noise A relatively simple smoothing function is called “moving average” or “rolling mean.” Simply put, this smoothing function computes the average of the previous w data