AP STATISTICS Summer FUN 2015-2016 School Year This packet that contains information and examples of basic statistics problems, and also exercises for the student to complete Brief Description of Summer Assignment: Resources Necessary to Complete Assignment: Graphing calculator, Internet Access For students to gain understanding in basic statistical topics that should be known before starting AP Statistics Also, students should learn important vocabulary that will be used throughout the year Objective of Summer Assignment: Approximate time commitment during the Summer: Due Date: – hours SECOND day of scheduled class Value of Assignment: 75 points For questions over the summer, please contact: Ms.Esfandiari- heather.esfandiari@lcps.org REMEMBER: THE SBHS HONOR CODE APPLIES TO THIS ASSIGNMENT: DO NOT COPY ANSWERS FROM YOUR CLASSMATES Welcome to AP Statistics! This course is built around four main topics: exploring data, planning a study, probability as it related to distributions of data, and inferential reasoning Among leaders of industry, business, government, and education, almost everyone agrees that some knowledge of statistic is necessary to be an informed citizen and a productive worker This assignment is due the SECOND day of class and will count for 75 points Summer Packet Guidelines Start summer assignment early to allow for time to receive clarification (if necessary) and to complete it by the SECOND day of class If you have any questions, you may contact me Please not wait until the last minute to contact me and I will be busy preparing for the upcoming school year and may not be able to response as quickly to your last minute questions!! E-mail for Questions: Ms Esfandiari – heather.esfandiari@lcps.org I have provided a small resource of information on statistical basics at the end of this packet (Appendix 2) However, if you are still stuck and cannot complete the problems on your own it is okay to use math reference books and websites to help Google is a wonderful thing! You can Google any term or concepts if you want to find more information I also recommend the following websites: http://stattrek.com/ http://calculator.maconstate.edu/calc_topics.html (Calculator help) Do your work in this packet only! There should be enough of room to write all answers Only use separate paper if absolutely necessary I RECOMMEND YOU HAVE YOUR OWN GRAPHING CALCULATOR AND BRING IT TO CLASS EVERYDAY!! A TI-83 is the minimum calculator needed for this course TI-84 or TI-84 + is better The TI-84 will be the calculator demonstrated in class Do not discard the owner’s manual that is included when you purchase a calculator If you choose not to use the TI-84+ (or TI-83) it will be your responsibility to learn where to located the functions we use in class I highly recommend you purchase a copy of the review book, Steps to a AP Statistics, 2014-2015 Edition (5 Steps to a on the Advanced Placement Examinations Series) ISBN: 0071802479 To obtain a copy of a book, I recommend either a book seller (ex Barnes & Noble) or Amazon (currently $10.99 on Amazon), Amazon also has used copies If you purchase a used copy, please make sure it is not written in Remember, this is an AP Course! Do not expect this to be an “easy course” Although it may not seem as difficult computationally as calculus, it required a great deal of outside reading and homework, and it required a thorough understanding of many abstract concepts This is as much a writing course as it is a math course! Explaining in complete sentences is required on this assignment and throughout the course You cannot just write down numbers and be done, you must use numbers in context – what they mean to that particular problem using appropriate units like feet or $, for example Enjoy your summer! Ms Esfandiari Name Block _Date Part 1: Why Statistics? A What is a statistician? Write one informative paragraph explaining what you think a statistician does Use two reputable sources (wikipedia doesn’t count), to help develop your paragraph The website: http://www.amstat.org/careers/ is a good one! B Why take statistics? A persuasive essay Write two to three paragraphs explaining why high school students should take a statistics class Use evidence to support your reasoning from the following sources to make your case: http://www.ted.com/talks/lang/eng/arthur_benjamin_s_formula_for_changing_ math_education.html http://www.wired.com/magazine/2010/04/st_thompson_statistics/ C Why are YOU taking statistics? What are you going to to ensure success? Read the letters at the end of this packet written by former AP statistics students (In Appendix 3) Write one paragraph explaining what you hope to gain from taking a class in Statistics What are your reasons for signing up for this class? What you hope to get out of the class? What is your plan to ensure success in AP statistics? Requirements of the paper: The final two-page paper should be typed, double-spaced, in Times New Roman 12pt black font It should include sections properly dividing the paper Remember to reference your sources! Please submit your two page write-up to Heather.esfandiari@lcps.org BEFORE the first day of class Part 2: Reading and Writing Read the two articles at the end of this packet (“Research Basics: Interpreting Change” and “Overstating Aspirin's Role in Breast Cancer Prevention”) from the Washington Post and then answer the following questions in complete sentences What was the story that the newspapers wrote after the research was published by the Journal of the American Medical Association? What other information needed to be added to the story so that people could make decisions for themselves about the use of aspirin to prevent breast cancer? How was the data collected to perform this study? What type of study was performed? Can this type of study be used to prove the aspirin prevents breast cancer? What type of study must be done in order to ‘prove’ something? What is the difference between ‘cause’ and ‘association’? You may have heard the statement “you can prove anything with statistics” Using what you have learned reading this article, explain what you think is meant by this statement Go on the internet to www.gapminder.org, select “Gapminder World” panel, and the scatterplot should load You are looking at worldwide data of Life Expectancy vs Per Capita Income Point your cursor at the x-axis or y-axis labels to get more information about these variables Every colored circle on the graph represents a country Point the cursor at various circles and the name of the country will appear The size of each circle is proportion to that country’s population—look in the lower right corner to see each country’s population as you point the cursor at it If you would like, slide the year indicator back to the first year that data was recorded (1950 for this combination of variables), and then click on “Play” to watch the change in the scatterplot, year by year, from that year to the present Even more fun is to select one or more countries (this causes all the other countries to dim into the background), and watch the track made by the selected countries over time What is the relationship between Per Capita Income and Life Expectancy in the world? 10 Which countries are the farthest from the pattern shown by the rest of the world? 11 12 13 14 Which country has the highest life expectancy now? Which has the highest per capita income now? Which has the lowest income now? The lowest life expectancy now? _ 15 Which group of countries (by color) has gained most since 1950 relative to the rest of the world, in both income and life expectancy? 16 Watch the “track” of Rwanda from 1950 – 2010 What events in Rwanda might explain the unusual changes that happened? Part 3: Vocabulary List Please define, IN YOUR OWN WORDS (handwritten), each of the following terms from the information on StatTrek website When asked, provide a unique example of the word Examples from the StatTrek website or this packet will NOT receive credit Categorical Variables Example: Quantitative Variables Example: Univariate Data: Bivariate Data: Median: Mean: Population: Example: Sample: Example: Center: 10 Spread: 11 Symmetry: 12 Unimodal and Bimodal: 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Skewness: Sketch Skewed Left: Sketch Skewed Right: Uniform: Gaps: Outliers: Dotplots: Difference between bar chart and histogram: Stemplots: Boxplots: Quartiles: Range: Interquartile Range: Parallel boxplots Parameter Statistic Part 4: Practice Problems CATEGORICAL OR QUANTITATIVE Determine if the variables listed below are quantitative or categorical Neatly print “Q” for quantitative and “C” for categorical _ Time it takes to get to school _ Height _ Number of shoes owned _ Amount of oil spilled _ Hair color _ 10 Age of Oscar winners _ Temperature of a cup of _ 11 Type of pain medication coffee _ 12 Jellybean flavors _ Teacher salaries _ 13 Country of origin _ Gender _ 14 Type of meat _ Facebook user STATISTIC – WHAT IS THAT? A statistic is a number calculated from data Quantitative data has many different statistics that can be calculated Determine the given statistics from the data below on the number of homeruns Mark McGuire has hit in each season from 1982 – 2001 70 39 52 65 22 42 49 29 32 32 58 39 33 Mean Minimum Maximum Median Q1 Q3 Range IQR CENTER & SPREAD OF A DISTRIBUTION: (REVIEW NOTES IN APPENDIX 2) Last year students collected data on the age of their moms and dads when they (the students…) were born The following are their results Dad: 41 27 27 26 23 28 31 32 30 32 33 35 26 27 32 33 43 34 25 34 34 34 27 35 25 34 Mom: 39 24 26 23 23 24 30 32 28 23 33 30 23 24 32 29 38 34 23 35 35 26 24 31 24 33 Find the mean and the median for the Dad data To find the mean using your calculator, go to 2nd STAT MATH and then type in L1 by typing 2nd This will add all the values in the list Then divide by 26 to get the mean Round Mean to Decimal places To find the median, sort the data in the lists: STAT L1 The median is exactly in the middle between the 13th and the 14th value Mean _ Median Are they the same? If not, which is larger? Find the mean and the median for the mom data Mean _ Median Are they the same? If not, which is larger? Now compare the two means you calculated Which is larger? Is this result what you expected? Why/why not? Give explanation in real world context Calculate the range for each set of data Dad Mom Are these ranges about the same? If no, what are some reasons that might cause this difference? Give explanation in real world context Find Q1 and Q3 for the Dad data Q1 Q3 Find Q1 and Q3 for the Mom data Q1 Q3 You have now calculated the “Five-Number Summary.” This can also be used as a way to determine the spread of a set of data The five-number summary consists of: Minimum Q1 Median Q3 Maximum Write the five number summary for the Dad data: _ Write the five number summary for the Mom data: Now calculate the IQR for each of the two sets of data Dad _ Mom _ 10 WEATHER! The data below gives the number of hurricanes that happened each year from 1944 through 2000 as reported by Science magazine a Make a dotplot to display these data Make sure you include appropriate labels, title, and scale 12 SHOPPING SPREE! A marketing consultant observed 50 consecutive shoppers at a supermarket One variable of interest was how much each shopper spent in the store Here are the data (round to the nearest dollar), arranged in increasing order: a Make a stemplot using tens of dollars as the stem and dollars as the leaves Make sure you include appropriate labels, title and key KEY 13 WHERE DO OLDER FOLKS LIVE? This table gives the percentage of residents aged 65 of older in each of the 50 states Histograms are a way to display groups of quantitative data into bins (the bars) These bins have the same width and scale and are touching because the number line is continuous To make a histogram you must first decide on an appropriate bin width and count how many observations are in each bin The bins for percentage of residents aged 65 or older have been started below for you a Finish the chart of Bin widths and then create a histogram using those bins on the grid below Make sure you include appropriate labels, title and scale 14 SSHA SCORES Here are the scores on the Survey of Study Habits and Attitudes (SSHA) for 18 first-year college women: 154 109 137 115 152 140 154 178 101 103 126 126 137 165 165 129 200 148 and for 20 first-year college men: 108 140 114 91 180 115 126 92 169 146 109 132 75 88 113 151 70 115 187 104 a Put the data values in order for each gender Compute numeral summaries for each gender 15 Appendix 1: Articles Research Basics: Interpreting Change Tuesday, May 10, 2005 How Big Is the Difference? Many medical studies end up concluding that two groups have different health outcomes death rates, heart attack rates, cholesterol levels and so forth This difference is typically expressed as a relative change , as in the statement: "The treatment group had 50 percent fewer cases of eye cancer than the control group." The problem with this comparison is that it provides no information about how common eye cancer is in either group Thinking about relative changes in risk is like deciding when to use a coupon at a store Imagine you have a coupon that says "50 percent off any one purchase." You go to the store to buy a pack of gum for 50 cents and a large Thanksgiving turkey for $35 Will you use the coupon for the gum or the turkey? Most people would use it for the turkey Why? Because paring half the price off $35 reaps a bigger savings $17.50 than cutting half off 50 cents or $0.25 The analogy in health is that "50 percent fewer cases" is a very different number when applied to eye cancer a rare problem accounting for about 2,000 new cases in the U.S each year than when applied to heart attacks a common problem accounting for about 800,000 new cases annually To really understand how big a difference is, you need to find out the starting and ending points sometimes called " absolute risks " In the coupon example, the start and end points are the regular and the sales price In a study about medical treatment, the start and end points are the chances of something happening in the untreated and treated groups Presenting the starting and ending point requires a few more words than presenting relative changes For example, "In a year, two of 100,000 untreated people developed eye cancer; in contrast, one of 100,000 treated people developed eye cancer." For the price of a few more words you gain perspective: The chance of developing eye cancer is small Cause or Association? Many important insights into human health come from observational studies studies in which the researcher simply records what happens to people in different situations, without intervening Such studies first linked cigarette smoking to lung cancer and high cholesterol to heart disease But not all observed associations represent cause and effect And problems can occur when this key point is overlooked An example may help make the distinction clear A man thought his rooster made the sun rise Why? Because each morning when he woke up while it was still dark, he would hear his rooster crow as the sun rose He confused association with causation until the day his rooster died, when the sun rose without any help A more serious example involves the long-held belief that most women should take estrogen after menopause That idea, only recently discredited, also came from observational studies The observation shown in more than 40 studies involving hundreds of thousands women was that women who took estrogen supplements also had less heart disease But it turned out that estrogen was not the reason why this was the case Instead, women taking estrogen tended to be healthier and wealthier Their health and wealth not their estrogen supplements were responsible for the lower risk of heart disease The only way to reliably distinguish a cause from an association is to conduct a true experiment a randomized trial In this type of study, patients are assigned randomly that is, by chance to receive a therapy or not receive it This study design is the best way to construct two groups that are similar in every way except one whether they get the therapy being studied That means any differences observed afterward must be caused by the therapy In the case of estrogen and heart disease, such a study showed that the long-held beliefs were wrong 16 Unfortunately, it is not always possible to a randomized trial For example, it is extremely unlikely that we could get people to agree to be randomly assigned to either eating only fast food or only organic food every day for a year (and that they would actually adhere to the diet if they did agree to be randomized) In such cases, scientists have to rely on observational studies But when new tests or treatments are proposed, randomized trials ought to be conducted prior to their widespread use Doctors prescribed estrogen to millions of women for many years until the randomized trial showed that intuition and dozens of observational studies were wrong Lisa M Schwartz, Steven Woloshin and H Gilbert Welch A May 10 Health section story about a study exploring aspirin use and breast cancer prevention incorrectly labeled hormone receptor positive cancers the most dangerous kind That description applies to hormone receptor negative breast cancers Overstating Aspirin's Role in Breast Cancer Prevention How Medical Research Was Misinterpreted to Suggest Scientists Know More Than They Do By Lisa M Schwartz, Steven Woloshin and H Gilbert Welch Special to The Washington Post Tuesday, May 10, 2005 Medical research often becomes news But sometimes the news is made to appear more definitive and dramatic than the research warrants This series dissects health news to highlight some common study interpretation problems we see as physician researchers and show how the research community, medical journals and the media can better Preventing breast cancer is arguably one of the most important priorities for women's health So when the Journal of the American Medical Association published research a year ago suggesting that aspirin might lower breast cancer risk, it was understandably big news The story received extensive coverage in top U.S newspapers, including The Washington Post, the Wall Street Journal, the New York Times and USA Today, and the major television networks The headlines were compelling: "Aspirin May Avert Breast Cancer" (The Post), "Aspirin Is Seen as Preventing Breast Tumors" (the Times) In each story, the media highlighted the change in risk associated with aspirin noting prominently something to the effect that aspirin users had a "20 percent lower risk" compared with nonusers The implied message in many of the stories was that women should consider taking aspirin to avoid breast cancer But the media message probably misled readers about both the size and certainty of the benefit of aspirin in preventing breast cancer That's because the reporting left key questions unanswered: · Just how big is the potential benefit of aspirin? · Is it big enough to outweigh the known harms? · Does aspirin really prevent breast cancer, or is there some other difference between women who take aspirin regularly and those who don't that could account for the difference in cancer rates? This article offers a look at how the message got distorted, what the findings really signify and some broader lessons about interpreting medical research How Big a Benefit? Just how big is the potential benefit of aspirin? The 20 percent reduction in risk certainly sounds impressive But to really understand what this statistic means, you need to ask, "20 percent lower than what?" In other words, you need to know the chance of breast cancer for people who not use aspirin Unfortunately, this information did not appear in any of the media reports While it might be tempting to 17 fault journalists for sloppy, incomplete reporting, it is hard to blame them when the information was missing from the journal article itself In the study, Columbia University researchers asked approximately 3,000 women with and without breast cancer about their use of aspirin in the past The typical woman in this study was between the ages of 55 and 64 According to the National Cancer Institute, about 20 out of 1,000 women in this age group will develop breast cancer in the next five years Therefore, the "20 percent lower chance" would translate into a change in risk from 20 per 1,000 women to 16 per 1,000 -or four fewer breast cancers per 1,000 women over five years For people who prefer to look at percentages, this translates as meaning that percent develop breast cancer without aspirin, while 1.6 percent develop it with aspirin, for an absolute risk reduction of 0.4 percent over five years Another way to present these results would be to say that a woman's chance of being free from breast cancer over the next five years was 98.4 percent if she used aspirin and 98 percent if she did not Seeing the actual risks leaves a very different impression than a statement like "aspirin lowers breast cancer risk by 20 percent." (See "Research Basics: How Big Is the Difference?") Against What Size Harms? Is the potential benefit of aspirin big enough to outweigh its known harms? Unfortunately, aspirin, like most drugs, can have side effects These, according to the U.S Preventive Services Task Force, include a small risk of serious (and possibly fatal) bleeding in the stomach or intestine, or strokes from bleeding in the brain harms briefly noted but not quantified in the original study or in most media reports To decide whether aspirin is worth taking, women need to know how the potential size of aspirin's benefit in reducing breast cancer compares with the drug's potential harms Sound medical practice dictates doing the same kind of calculation of potential benefits against potential harms -anytime you consider taking a drug We provide the relevant information in the "Aspirin Study Facts," below The first column shows the health outcome being considered (e.g., getting breast cancer, having a major bleeding event) The second column shows the chance of the outcome over five years for women not taking aspirin The third column shows the corresponding chance for women taking aspirin And the fourth column shows the difference the possible effect of aspirin As the table shows, the size of the known risk for stomach bleeding to a woman taking aspirin daily nearly matches the size of the still-hypothetical benefit in terms of breast cancer protection That kind of comparison might lead some women to conclude that the tradeoff doesn't warrant the risk While it may take you some time to become familiar with this table, we think this sort of presentation would be helpful in many situations; for example, whenever people are deciding about taking a new medication or undergoing elective surgery Is It Really Aspirin? Does aspirin really prevent breast cancer, or is there some other difference between women in the study that could account for the difference in cancer rates? Can we be sure that aspirin was responsible for the "20 percent fewer" breast cancers that the Columbia researchers found among aspirin users compared with nonusers? To understand why not, it is necessary to know some of the details about how the study was conducted The researchers collected information from all of the women in New York's Nassau and Suffolk counties on Long Island, who were diagnosed with breast cancer in 1996 and 1997 For comparison, they matched these women with others who did not have breast cancer, but who were about the same age and from the same counties The researchers asked all the women about their use of aspirin 18 They found that aspirin use was more common among the women without breast cancer While the researchers were careful to report that the use of aspirin was "associated" with reduced risk of breast cancer, the media used stronger language, suggesting aspirin played a role in preventing breast tumors Unfortunately, this kind of study an observational study cannot prove that it was the aspirin that lowered breast cancer risk Strictly speaking, the researchers demonstrated only that there is an association between aspirin and breast cancer Consider how an association between aspirin and breast cancer could exist even if aspirin has no effect on breast cancer It could be that women who use aspirin regularly are already at a lower risk of breast cancer Imagine, for example, there was a gene that protected against breast cancer but also made people more susceptible to pain Women who carried this gene would be more apt to use aspirin for pain relief The lower breast cancer risk in aspirin users might simply reflect the fact that they had this gene In other words, aspirin might have nothing to with the findings To really know if aspirin lowers breast cancer risk would require a different kind of study a randomized trial (See "Research Basics: Cause or Association?") Nonetheless, observational studies are important (and often crucial) in building the case for doing a randomized trial In this instance, the researchers had a theory for how aspirin might prevent breast cancers They predicted that it would only be true for certain kinds of cancers (so-called hormone receptor positive cancers, the most dangerous kind, which account for about 60 percent of all breast cancers) And that is just what they observed: The association between aspirin and breast cancer was not seen in hormone receptor negative cancers That the researchers' prediction was correct supports (but does not prove) the idea that aspirin reduces risk The next logical step would be a randomized trial The difference between "cause" and "association" may seem subtle, but it is actually profound Even so, people like the headline writers in this case often go beyond the evidence at hand and assume that an association is causal Readers should know that many associations not reflect cause and effect The Bottom Line In a large observational study, researchers found slightly fewer breast cancers among women who took aspirin regularly compared with women who did not Because aspirin's benefit in reducing breast cancer (assuming it can be proven) was small, it may not outweigh the drug's known harms While it is possible that aspirin itself reduces the risk of breast cancer, we cannot be sure from this study It would take a randomized trial to be certain Fortunately, one has just been completed by researchers at Harvard Medical School, and the results are expected in the very near future Until then, it is too soon to recommend taking aspirin to prevent breast cancer · Lisa Schwartz, Steven Woloshin and Gilbert Welch are physician researchers in the VA Outcomes Group in White River Junction, Vt., and faculty members at the Dartmouth Medical School They conduct regular seminars on how to interpret medical studies (Seehttp://www.vaoutcomes.org.) The views expressed not necessarily represent the views of the Department of Veterans Affairs or the United States Government © 2005 The Washington Post Company 19 Appendix 2: “Quick Reference” of Statistical Basics I Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc For these data, it makes sense to find things like “average” or “range” (largest value – smallest value) For instance, it doesn’t make sense to find the mean shirt color because shirt color is not an example of a quantitative variable Some quantitative variables take on discrete values, such as shoe size (6, ½, 7, …) or the number of soup cans collected by a school Other quantitative variables take on continuous values, such as your height (60 inches, 72.99999923 inches, 64.039 inches, etc,) or how much water it takes to fill up your bathtub (73.296 gallons or 185 gallons or 99 gallons, etc.) Categorical (or qualitative) Data These are data that take on values that describe some characteristic of something, such as the color of shirts These values are “categories” of a population, such as M or F for gender of people, Don’t Drive or Drive for the method of transportation used by students to get to school These are examples of binary variables These variables only have two possible values Some categorical variables have more than two values, such as hair color, brand of jeans, and so on Two types of variables: Quantitative Discrete Continuous categories Categorical Binary More than 20 II Numerical Descriptions of Quantitative Data Measures of Center Mean: The sum of all the data values divided by the number (n) of data values Example Data: 4, 36, 10, 22, Mean = x = xi 36 10 22 81 = = = 5 n Median: The middle element of an ordered set of data Examples Data: 4, 36, 10, 22, = 10 22 36 Data: 4, 36, 10, 22, 9, 43 = 10 | Median = 10 22 36 43 Median = 10 22 = 16 Measures of Spread: Range: Maximum value – Minimum value Example Data: 4, 36, 10, 22, = 10 22 36 Range = Max – Min = 36 – = 32 Interquartile Range (IQR): The difference between the 75th percentile (Q3) and the 25th percentile(Q1) This is Q3 – Q1 Q1 is the median of the lower half of the data and Q is the median of the upper half In neither case is the median of the data included in these calculations The IQR contains 50% of the data Each quartile contains 25% of the data Examples Data: 4, 36, 10, 22, = 10 22 Q1 = 6.5 So, the IQR = 29 – 6.5 = 22.5 Data: 10 | 36 Q3 = 29 22 36 43 Q1 Q3 So, the IQR = 36 – = 27 21 Five-number summary: consists of Minimum , Q1, Median, Q3, and Maximum To find these statistics, enter the data you have into your calculator using the list function : STAT ENTER type the data into L1 If you make a mistake, you can go to the error and DELETE If you forget an item, you can go to the line below where it is supposed to be and press 2nd DEL to insert it To find the each value of the five-number summary, go to nd STAT MATH and then type in L1 by typing 2nd NOTE: If the lists you are using already have numbers in them before you start, you can clear them this way: Arrow up ( ) to the line where L1 is shown Press CLEAR, then the down arrow ( ) III Graphical Displays of Univariate (one variable) Data •Dotplot •Boxplot (Box and Whiskers) •Histogram To make a Dotplot: Dot Plot Student GPA's •Stemplot (Stem and Leaf) Draw and label a number line so that all the values in your dataset will fit Graph each of the data values with a dot Be sure to line the dots up vertically as well as horizontally so that you can really see the shape of the graph 0.5 1.0 1.5 2.0 2.5 GPA 3.0 3.5 4.0 Stemplot of Student GPAs 1 1 2 2 3 3 23 444 67 88888999 00000000000000000111111111 3333333333333333333333 4444444444444444445555555555 66666666666677777 8888888888999999999999999 0000000000000000000111111111 2233333333333333 44444444455 6666677 889 Key: 3|4 = 3.4 TO MAKE A STEMPLOT: Put the data in ascending order Make a key! Use only the last digit of the number as a leaf (see the numbers to the right of the line –each digit is the last digit of a larger number) Use one, two, or more digits as the stem (Sometimes, you can truncate data when there are too many digits in each data value – i.e the number 20, 578 would become 20 | 5, where the “20” is in thousands Note that this is different from rounding.) Place the “stem” digit(s) to the left of the line and the leaf digit to the right of the line Do this for each data value You should then arrange the “leaves” in ascending order Sometimes, there are many numbers with the same “stem.” In this situation it might be useful to break the numbers with the same stem into either two distinct groups (each on a separate line; say, “leaves” from – on the first line and – on the second.) or into five distinct groups as is shown in the graph to the right Here, the first line for each stem contains all the – leaves, the next line contains the – leaves and so on This technique is called “splitting the stems.” It is useful in some cases in order to 22 show the shape of the data more clearly To make a Boxplot: Boxplot of Student GPA s GPA Histogram of Student GPAs 70 60 Frequency 50 40 30 20 10 1.0 1.5 2.0 2.5 GPA 3.0 3.5 Draw and label a number line that includes the minimum and the maximum values for the set of data Calculate the five-number summary and make a dot for each of these summary numbers above the number line Draw a line between the 1st and 2nd dot, showing the “lower quartile”; and then draw a line from the 4th to the 5th dot to show the “upper quartile.” These are commonly called the “whiskers.” Draw a rectangular box from the 2nd to the 4th dot and draw a line through the box on the middle dot – the median NOTE: In AP Statistics, a “modified boxplot” is used This shows To any make a histogram: “outliers.” An outlier is a data point that does not fit the pattern of the rest of the data When your calculator or computer software graphs Put the data into ascending order a modified boxplot, an algorithm is used to determine what it takes to Decide upon evenly spaced intervals into which to divide the “not fit of the rest the30, data.” is:the setthe of pattern data (such as 0, 10,of20, etc.)This andalgorithm then count number of values that fall within each interval This number 1.5*( IQR ) away from the “box” part of the graph (above and below is called the “frequency.” If you divide each of these the box) These outliers are shown with dots or stars, or any other small frequencies by the size of the data set, n, making percents, symbol then you have what are called “relative frequencies.” Draw and label a 1st quadrant graph using scales appropriate for the data Be sure to include a title for the x- and for the yaxes Graph the frequencies that you calculated in step Categorical Data: •Bar Graph •Circle Graph (Pie Chart) I’m assuming that you already know how to make these two types of graphs If you need help, you can search the internet for directions 23 IV Assessing the Shape of a Graph There are two basic shapes that we will examine: Symmetric and Skewed Symmetric: One can tell if a graph is symmetric if a vertical line in the “center” divides the graph into two fairly congruent shapes (A graph does not have to be “bell-shaped” to be considered symmetric.) Mean ~ Median in a symmetric distribution Symmetric Skewed: One can tell that a graph is skewed if the graph has a big clump of data on either the left (skewed right) or on the right (skewed left) with a tendency to get flatter and flatter as the values of the data increase (skewed right) or decrease (skewed left) A common misconception is that the “skewness” occurs at the big clump Relationship between Mean and Median in a skewed distribution: Skewed Left, the mean is Less Skewed Right Skewed Right, the mean is Might Gathering Information from a Graphical Display The first thing that should be done after gathering data is to examine it graphically and numerically to find out as much information about the various features of the data as possible These will be important when choosing what kind of procedures will be appropriate to use to find out an answer to a question that is being investigated The features that are the most important are Center, Unusual Features, Shape, and Spread: CUSS Most of these can only be seen in a graph However, sometimes the shape is indistinct – difficult to discern So, in this instance (usually because of a very small set of data), it’s appropriate to label the shape “indistinct.” 24 Appendix 3: Letters from Former Students Dear Future AP Students, My name is Jack Kitto, and I have almost made it through Ms Poland’s AP Statistics class I’m writing to you youngbloods to offer invaluable advice on how to not only survive, but excel in this challenging class First off, AP statistics is not an easy class If you were expecting an easy B+ or A, think again It may be easier than AB Calculus, but it is much more challenging than other AP classes such as, AP Economics, AP Government, and AP Psychology Although the actual math that is used to solve statistics problems isn’t terribly complex, you have to be very methodical and precise to receive full credit on tests In AP Stats, you must memorize many formulas and calculator commands to succeed If you are still reading this, you may now be hesitant to enroll in this class, but if you aren’t soft, you will man-up and take this class I will now go over various strategies to conquer AP Statistics First off, your homework on a regular basis If possible, some homework EVERY DAY It sounds cliché, but this really is imperative to your success If you your stats homework every day, and understand it, conservatively speaking, I will guarantee that you will get a B+ or higher in the class I know it requires a lot of motivation and drive to stats every single day of your life, but this amount of work will get you the grades you want I will be honest with you though, if you don’t your homework and cram before tests (like I did), you can squeak out a B, but you will live a very stressful life and will live in constant emotional turmoil Now if you decide to not your homework or even study, then expect a swift D or F That type of behavior will get you times out of 10 In conclusion, don’t be that guy Do your homework Another invaluable lesson I can offer you squids, is to take full advantage of the generous amount of classwork that Ms Poland gives you These assignments are basically free points If you aren’t doing so hot on tests/quizzes, the classwork assignments can bail you out big time On the other hand, if you dominate on tests, make sure you aren’t throwing away the “free points” Although at times you may think that Ms Poland wants you to fail, she doesn’t She offers classwork assignments for a reason If you follow my advice, there is no reason why you shouldn’t be successful in AP statistics Take the class, you won’t regret it Sincerely, Jack Kitto Dear Future AP statistics student There are a lot of students who take AP stats just to avoid calculus or any other hard math class AP stats is not an easy class Just like anything else in life it takes dedication and effort If you think you are going to slide by doing the bare minimum and still receive an A, you are probably wrong Although I think this is a difficult class, it is an important class Statistics are involved in everyday life and most professions use it in some way Taking this class will give you the knowledge of what you will probably need 25 to take in college at some point Also doing well on the AP test can save you money and time because you will not have to take the class in college I wish I took advantage of advanced placement classes throughout my high school experience I look back now after the year is finishing up and I realize that if I tried a little harder, I would have a better grade My advice to you is to try Do all the homework, even if it is optional The units I did the homework are the tests I did noticeably better on I also advise you to take advantage of the teachers who are willing to help the students Yes you may want to sleep in an extra thirty minutes, but every now and then show dedication Be the type of student to take advantage of anything they can to become a better student The class usually consists of taking notes and doing labs Everything seems to be related to the real world The probability unit is what you should look forward to because there are multiple candy labs I know that both teachers made the year fun and kept the class engaged Everyone can succeed in this class as long as you get your work done and you show effort As for the AP exam, I did not take it because it was not accepted at the university I was attending I highly recommend taking the exam if the college you are interested in takes the credit From what I have heard from friends and peers is that the exam was not hard at all You can get multiple questions wrong and still receive a 5! The AP stats teachers an amazing job at reviewing all the material at the end of the year so you will be prepared! Other things you can to prepare for the test is getting review books and going through them Overall AP statistics is the type of class you may dislike throughout the year but at the end of the year will be glad you took it It will get difficult at times just like any other class but dedication is key Good luck and have a great school year! Sincerely, Old AP statistics student 26 ... graph using scales appropriate for the data Be sure to include a title for the x- and for the yaxes Graph the frequencies that you calculated in step Categorical Data: •Bar Graph •Circle Graph... Symmetric: One can tell if a graph is symmetric if a vertical line in the “center” divides the graph into two fairly congruent shapes (A graph does not have to be “bell-shaped” to be considered symmetric.)... informed citizen and a productive worker This assignment is due the SECOND day of class and will count for 75 points Summer Packet Guidelines Start summer assignment early to allow for time to receive