Humanities Data Analysis “125 85018 Karsdrop Humanities ch01 3p” — 2020/8/19 — 11 04 — page 225 — #25 Introduction to Probability • 225 Bayes’s rule tells us how to update our beliefs about whether or[.]
“125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:04 — page 225 — #25 Introduction to Probability Bayes’s rule tells us how to update our beliefs about whether or not Hamilton wrote this hypothesized disputed essay about which we have only learned the rate of by instances is 14 per 1,000 In such a case the odds turn against Hamilton being the author, approximately to in favor of Madison The rate of by in the unknown essay is just one piece of evidence which we might consider in assessing how much plausibility we assign to the claim “Hamilton wrote paper No 52.” Mosteller and Wallace consider the rates of thirty words (by is in the group of “final words”) in their analysis (Mosteller and Wallace 1964, 67–68) 6.4 Further Reading This chapter introduced Bayesian inference, one technique for learning from experience Among many possible approaches to learning from observation, Bayesian inference provides a specific recipe for using probabilities to describe degrees of belief and for updating degrees of belief based on observation Another attractive feature of Bayesian inference is its generality Provided we can come up with a description of our prior degree of belief in an event occurring, as well as a description of how probable some observation would be under various hypotheses about the event, Bayes’s rule provides us with a recipe for updating our degree of belief in the event being realized after taking into consideration the observation To recall the example that we concluded with, Bayesian inference provides us with a principled way of arriving at the claim that “it is very likely that Madison (rather than Hamilton) wrote Federalist No 62” given observed rates of word usage in Federalist No 62 This is a claim that historians writing before 1950 had no way of substantiating Thanks to Bayesian inference and the work of Mosteller and Wallace, the evidence and procedure supporting this claim are accessible to everyone interested in this case of disputed authorship For essential background reading related to this chapter, we recommend Grinstead and Snell (2012) which provides an introduction to discrete and continuous probability Their book is published by the American Mathematical Society and is available online at https://math.dartmouth.edu/∼prob/prob /prob.pdf For those interested in further reading related to the topics addressed in this chapter, we recommend an introductory text on Bayesian inference Those with fluency in single-variable calculus and probability will be well served by Hoff (2009) While Hoff (2009) uses the R programming language for performing computation and visualizing results, the code provided may be translated into Python without much effort Exercises Easy Which of the following terms is used to denote a prior belief? (a) Pr(E|H), (b) Pr(H|E), or (c) Pr(H) • 225 “125-85018_Karsdrop_Humanities_ch01_3p” — 2020/8/19 — 11:04 — page 226 — #26 226 • Chapter Which of the following terms is used to describe the likelihood of an observation given a hypothesis? (a) Pr(E|H), (b) Pr(E), or (c) Pr(H|E) Recall the example about Pynchon from the introduction to this chapter Suppose we improve our stylometric test to accurately identify a novel as being written by Pynchon from 90 percent to 99 percent of the time The false positive rate also decreased and now equals 0.1 percent The probability that a novel was written by Pynchon is still 0.001 percent Suppose another text tests positive on our stylometric test What is the probability that the text was written by Pynchon? Mosteller and Wallace describe Madison’s and Hamilton’s word usage in terms of frequency per 1,000 words While most essays were longer— typically between 2,000 and 3,000 words—pretending as if each document were 1,000 words and contained a fixed number of occurrences of the words of interest allows us to compare texts of different lengths Inaccuracies introduced by rounding will not be consequential in this case.10 Calculate the frequency per 1,000 words of upon, the, and enough Moderate Mosteller and Wallace started their investigation of the authorship of the disputed essays in The Federalist Papers by focusing on a handful of words, which, closely reading the essays, had revealed as distinctive: while, whilst, upon, enough Focus on the word enough Suppose you are about to inspect one of the disputed essays and see if enough appears How many times does the word enough occur at least once in essays by Madison? How many times does the word occur at least once in essays by Hamilton? Establish values for Pr(H), Pr(E|H), and Pr(E|¬H) that you find credible (Pr(E|H) here is the probability that the word enough appears in a disputed essay when Hamilton, rather than Madison, is the author.) Suppose you learn that enough appears in the disputed essay How does your belief about the author change? Challenging Consider the rate at which the word of occurs in texts with known authorship If you were to use a binomial distribution (not a negative binomial distribution) to model each author’s use of the word of (expressed in frequency per 1,000 words), what value would you give to the parameter θ associated with Hamilton? And with Madison? Working with the parameter values chosen above, suppose you observe a disputed essay with a rate of of s per 1,000 words Does this count as 10 Each essay is not a sequence of 1,000 words An essay is a sequence of words of fixed length (assuming we can agree on word-splitting rules) The writing samples from Madison and Hamilton tend to be about 2,000 words, on average, and the words of interest are common so the consequences of this infidelity to what we know to be the case will be limited ...“125-85018_Karsdrop _Humanities_ ch01_3p” — 2020/8/19 — 11:04 — page 226 — #26 226 • Chapter Which of the following terms